0
INTRODUCTION TO
ELASTICSEARCH
2

Agenda
• Me
• ElasticSearch Basics
• Concepts
• Network / Discovery
• Data Structure
• Inverted Index
• The REST API
• ...
3

Me
• Roy Russo
• JBoss Portal Co-Founder
• LoopFuse Co-Founder
• ElasticHQ
• http://www.elastichq.org
• AltiSource Labs...
4

ElasticSearch in One Slide
• Document - Oriented Search Engine
• JSON
• Apache Lucene
• No Schema
• Mapping Types
• Hor...
5

When to use ElasticSearch
• Full-Text Search
• Fast Read Database
• “Simple” Data Structures
• Minimize Impedance Misma...
6

When to use ElasticSearch - Logs
• Logstash + ElasticSearch + Kibana
7

How to use ElasticSearch - CQRS
Client

Command Sent
Ack Resp.

Remote Interfaces
Services
Domain Objects
Data
Storage
...
8

How to use ElasticSearch - CQRS
Request DTO
DTO Returned

Client

Command Sent
Ack Resp.

Remote Interfaces

Remote Int...
9

A note on Rivers
• JDBC
• CouchDB
• MongoDB
• RabbitMQ

• Twitter
• And more…

"type" : "jdbc",
"jdbc" : {
"driver" : "...
10

ElasticSearch at Work
REALTrans

REALServicing
REALSearch

ElasticSearch

REALDoc
11

What sucks about ElasticSearch
• No AUTH/AUTHZ
• No Usage Metrics
12

How the World Uses ElasticSearch
13

The Basics - Distro
• Download and Run
Executables

Node Configs

Data Storage

Log files

├── bin
│ ├── elasticsearch...
14

The Basics - Glossary
• Node = One ElasticSearch instance (1 java proc)
• Cluster = 1..N Nodes w/ same Cluster Name
• ...
15

The Basics - Document Structure
• Modeled as a JSON object
{

{
"genre": "Crime",
“language": "English",
"country": "U...
16

The Basics - Document Structure
• Document Metadata fields
• _id
• _type : mapping type
• _source : enabled/disabled
•...
17

The Basics - Document Structure
• Mapping:
• ES will auto-map (type) fields
• You can specify mapping, if needed
• Dat...
18

A Mapping Type
"imdb": {
"movie": {
"properties": {
"country": {
"type": "string“,
“store”:true,
“index”:false
},
"gen...
19

Lucene – Inverted Index
• Which presidential speeches contain the words “fair”
• Go over every speech, word by word, a...
20

Lucene – Inverted Index
• Inverting
• Take all the speeches
• Break them down by word (tokenize)
• For each word, stor...
21

Lucene – Inverted Index
• Not an algorithm
• Implementations vary
22

Cluster Topology
• 4 Node Cluster
• Index Configuration:
• “A”: 2 Shards, 1 Replica
• “B”: 3 Shards, 1 Replica

A1
B3
...
23

Building a Cluster
Start Cluster…
start cmd.exe /C elasticsearch -Des.node.name=Primus
start cmd.exe /C elasticsearch ...
24

Cluster State
• Cluster State
• Node Membership
• Indices Settings and Mappings (Types)
• Shard Allocation Table
• Sha...
25

Cluster State
• Changes in State published from Master to other nodes
1

(M)

3

2

PUT /newIndex
CS1

1

(M)

CS1

1
...
26

Discovery
• Nodes discover each other using multicast.
• Unicast is an option
discovery.zen.ping.multicast.enabled: fa...
27

The Basics - Shards
• Primary Shard:
• First time Indexing
• Index has 1..N primary shards (default: 5)
• # Not change...
28

Shard Auto-Allocation
• Add a node - Shards Relocate

Node 1

Node 2

0P

1P

1R

0R

Node 2

0R

• Shard Stages
• UNA...
29

The Basics – Searching
• How it works:
• Search request hits a node
• Node broadcasts to every shard in the index
• Ea...
30

The Basics - Shards
• Shard Allocation Awareness
• cluster.routing.allocation.awareness.attributes: rack_id
• Example:...
31

Nodes
• Master node handles cluster-wide (Meta-API) events:
• Node participation
• New indices create/delete
• Re-Allo...
32

REST API
• Create Index
• action.auto_create_index: 0
• Index Document
• Dynamic type mapping
• Versioning
• ID specif...
33

REST API – Versioning
• Every document is Versioned
• Version assigned on creation
• Version number can be assigned
34

REST API - Update
• Update using partial data
• Partial doc merged with existing
• Fails if document doesn’t exist
• “...
35

REST API
• Exists
• No overhead in loading
• Status Code Result
• Delete
• Get
• Multi-Get

{
"docs" : [
{
"_id" : "1"...
36

REST API - Search
• Free Text Search
• URL Request
• http://localhost:9200/imdb/movie/_search?q=scar*
• Complex Query
...
37

REST API - Search
• Search Types:
• http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+A
ND+year:[1941+TO+1...
38

REST API – Query DSL
http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+AND+year:[1981+TO+1984]

Becomes…
c...
39

REST API – Query DSL
• Query String Request use Lucene query syntax
• Limited
• Instead use “match” query

curl -XPOST...
40

REST API – Query DSL
• Match Query
{
“match”:{
“title”:{
“type”:“phrase”,
“query”:“quick fox”,
“slop”:1
}
}
}

• Boole...
41

REST API – Query DSL
• Range Query
• Numeric / Date Types
• Prefix/Wildcard Query
• Match on partial terms
• RegExp Qu...
42

REST API – Query DSL
• Geo_bbox
• Bounding box filter
• Geo_distance
• Geo_distance_range

{
"query":{
"filtered":{
"q...
43

REST API – Bulk Operations
• Bulk API
• Minimize round trips with index/delete ops
• Individual response for every req...
44

Percolate API
• Reversing Search
• Store queries and filter (percolate) documents through them.
• Useful for Alert/Mon...
45

Clients
• Client list: http://www.elasticsearch.org/guide/clients/
• Java Client, JS, PHP, Perl, Python, Ruby
• Spring...
46

B’what about Mongo?
• Mongo:
• General purpose DB
• ElasticSearch:
• Distributed text search engine

… that’s all I ha...
47

Questions?
Upcoming SlideShare
Loading in...5
×

ElasticSearch - DevNexus Atlanta - 2014

6,825

Published on

Introduction to ElasticSearch, delivered on 2/25/2014 at DevNexus conference in Atlanta.

Published in: Technology
0 Comments
16 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
6,825
On Slideshare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
Downloads
149
Comments
0
Likes
16
Embeds 0
No embeds

No notes for slide

Transcript of "ElasticSearch - DevNexus Atlanta - 2014"

  1. 1. INTRODUCTION TO ELASTICSEARCH
  2. 2. 2 Agenda • Me • ElasticSearch Basics • Concepts • Network / Discovery • Data Structure • Inverted Index • The REST API • Sample Deployment
  3. 3. 3 Me • Roy Russo • JBoss Portal Co-Founder • LoopFuse Co-Founder • ElasticHQ • http://www.elastichq.org • AltiSource Labs Architect
  4. 4. 4 ElasticSearch in One Slide • Document - Oriented Search Engine • JSON • Apache Lucene • No Schema • Mapping Types • Horizontal Scale, Distributed • REST API • Vibrant Ecosystem • Tooling, Plugins, Hosting, Client-Libs
  5. 5. 5 When to use ElasticSearch • Full-Text Search • Fast Read Database • “Simple” Data Structures • Minimize Impedance Mismatch
  6. 6. 6 When to use ElasticSearch - Logs • Logstash + ElasticSearch + Kibana
  7. 7. 7 How to use ElasticSearch - CQRS Client Command Sent Ack Resp. Remote Interfaces Services Domain Objects Data Storage Request DTO DTO Returned
  8. 8. 8 How to use ElasticSearch - CQRS Request DTO DTO Returned Client Command Sent Ack Resp. Remote Interfaces Remote Interfaces Services DTO Read Layer Domain Objects Event Storage ? Data Storage
  9. 9. 9 A note on Rivers • JDBC • CouchDB • MongoDB • RabbitMQ • Twitter • And more… "type" : "jdbc", "jdbc" : { "driver" : "com.mysql.jdbc.Driver", "url" : "jdbc:mysql://localhost:3306/my_db", "user" : "root", "password" : "mypassword", "sql" : "select * from products" }
  10. 10. 10 ElasticSearch at Work REALTrans REALServicing REALSearch ElasticSearch REALDoc
  11. 11. 11 What sucks about ElasticSearch • No AUTH/AUTHZ • No Usage Metrics
  12. 12. 12 How the World Uses ElasticSearch
  13. 13. 13 The Basics - Distro • Download and Run Executables Node Configs Data Storage Log files ├── bin │ ├── elasticsearch │ ├── elasticsearch.in.sh │ └── plugin ├── config │ ├── elasticsearch.yml │ └── logging.yml ├── data │ └── cluster1 ├── lib │ ├── elasticsearch-x.y.z.jar │ ├── ... │ └── └── logs ├── elasticsearch.log └── elasticsearch_index_search_slowlog.log └── elasticsearch_index_indexing_slowlog.log
  14. 14. 14 The Basics - Glossary • Node = One ElasticSearch instance (1 java proc) • Cluster = 1..N Nodes w/ same Cluster Name • Index = Similar to a DB • Named Collection of Documents • Maps to 1..N Primary shards && 0..N Replica shards • Mapping Type = Similar to a DB Table • Document Definition • Shard = One Lucene instance • Distributed across all nodes in the cluster.
  15. 15. 15 The Basics - Document Structure • Modeled as a JSON object { { "genre": "Crime", “language": "English", "country": "USA", "runtime": 170, "title": "Scarface", "year": 1983 "_index": "imdb", "_type": "movie", "_id": "u17o8zy9RcKg6SjQZqQ4Ow", "_version": 1, "_source": { "genre": "Crime", "language": "English", "country": "USA", "runtime": 170, "title": "Scarface", "year": 1983 } } }
  16. 16. 16 The Basics - Document Structure • Document Metadata fields • _id • _type : mapping type • _source : enabled/disabled • _timestamp • _ttl • _size : size of uncompressed _source • _version
  17. 17. 17 The Basics - Document Structure • Mapping: • ES will auto-map (type) fields • You can specify mapping, if needed • Data Types: • String • Number • Int, long, float, double, short, byte • Boolean • Datetime • formatted • geo_point, geo_shape • Array • Nested • IP
  18. 18. 18 A Mapping Type "imdb": { "movie": { "properties": { "country": { "type": "string“, “store”:true, “index”:false }, "genre": { "type": "string“, "null_value" : "na“, “store”:false, “index:true }, "year": { "type": "long" } } } }
  19. 19. 19 Lucene – Inverted Index • Which presidential speeches contain the words “fair” • Go over every speech, word by word, and mark the speeches that contain it • Fails at large scale
  20. 20. 20 Lucene – Inverted Index • Inverting • Take all the speeches • Break them down by word (tokenize) • For each word, store the IDs of the speeches • Sort all words (tokens) • Searching • Finding the word is fast • Iterate over document IDs that are referenced Token Doc Frequency Doc IDs Jobs 2 4,8 Fair 5 1,2,4,8,42 Bush 300 1,2,3,4,5,6, …
  21. 21. 21 Lucene – Inverted Index • Not an algorithm • Implementations vary
  22. 22. 22 Cluster Topology • 4 Node Cluster • Index Configuration: • “A”: 2 Shards, 1 Replica • “B”: 3 Shards, 1 Replica A1 B3 B2 A2 B1 B2 A1 B1 B3 A2
  23. 23. 23 Building a Cluster Start Cluster… start cmd.exe /C elasticsearch -Des.node.name=Primus start cmd.exe /C elasticsearch -Des.node.data=true -Des.node.master=false -Des.node.name=Slayer start cmd.exe /C elasticsearch -Des.node.data=true -Des.node.master=true -Des.node.name=Maiden Create Index… curl -XPUT 'http://localhost:9200/imdb/' -d '{ "settings" : { "index" : { "number_of_shards" : 3, "number_of_replicas" : 1 } } }' Index Document… curl -XPOST 'http://localhost:9200/imdb/movie/' -d '{ "genre": “Comedy", "language": "English", "country": "USA", "runtime": 99, "title": “Big Trouble in Little China", "year": 1986 }'
  24. 24. 24 Cluster State • Cluster State • Node Membership • Indices Settings and Mappings (Types) • Shard Allocation Table • Shard State • cURL -XGET http://localhost:9200/_cluster/state?pretty=1'
  25. 25. 25 Cluster State • Changes in State published from Master to other nodes 1 (M) 3 2 PUT /newIndex CS1 1 (M) CS1 1 (M) CS2 3 2 CS2 CS1 CS1 CS1 3 2 CS2 CS2
  26. 26. 26 Discovery • Nodes discover each other using multicast. • Unicast is an option discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: ["host1", "host2:port", "host3"] • Each cluster has an elected master node • Beware of split-brain
  27. 27. 27 The Basics - Shards • Primary Shard: • First time Indexing • Index has 1..N primary shards (default: 5) • # Not changeable once index created • Replica Shard: • Copy of the primary shard • Can be changed later • Each primary has 0..N replicas • HA: • Promoted to primary if primary fails • Get/Search handled by primary||replica
  28. 28. 28 Shard Auto-Allocation • Add a node - Shards Relocate Node 1 Node 2 0P 1P 1R 0R Node 2 0R • Shard Stages • UNASSIGNED • INITIALIZING • STARTED • RELOCATING
  29. 29. 29 The Basics – Searching • How it works: • Search request hits a node • Node broadcasts to every shard in the index • Each shard performs query • Each shard returns results • Results merged, sorted, and returned to client. • Problems: • ES has no idea where your document is • Broadcast query to 100 nodes • Performance degrades
  30. 30. 30 The Basics - Shards • Shard Allocation Awareness • cluster.routing.allocation.awareness.attributes: rack_id • Example: • • • • • 2 Nodes with node.rack_id=rack_one Create Index 5 shards / 1 replica (10 shards) Add 2 Nodes with node.rack_id=rack_two Shards RELOCATE to even distribution Primary & Replica will NOT be on the same rack_id value. • Shard Allocation Filtering • node.tag=val1 • index.routing.allocation.include.tag:val1,val2 curl -XPUT localhost:9200/newIndex/_settings -d '{ "index.routing.allocation.include.tag" : "val1,val2" }'
  31. 31. 31 Nodes • Master node handles cluster-wide (Meta-API) events: • Node participation • New indices create/delete • Re-Allocation of shards • Data Nodes • Indexing / Searching operations • Client Nodes • REST calls • Light-weight load balancers
  32. 32. 32 REST API • Create Index • action.auto_create_index: 0 • Index Document • Dynamic type mapping • Versioning • ID specification • Parent / Child (/1122?parent=1111)
  33. 33. 33 REST API – Versioning • Every document is Versioned • Version assigned on creation • Version number can be assigned
  34. 34. 34 REST API - Update • Update using partial data • Partial doc merged with existing • Fails if document doesn’t exist • “Upsert” data used to create a doc, if doesn’t exist { “upsert" : { “title": “Blade Runner” } }
  35. 35. 35 REST API • Exists • No overhead in loading • Status Code Result • Delete • Get • Multi-Get { "docs" : [ { "_id" : "1" "_index" : "imdb" "_type" : "movie" }, { "_id" : "5" "_index" : "oldmovies" "_type" : "movie" "_fields" " ["title", "genre"] } ] }
  36. 36. 36 REST API - Search • Free Text Search • URL Request • http://localhost:9200/imdb/movie/_search?q=scar* • Complex Query • http://localhost:9200/imdb/movie/_search?q=scarface+OR +star • http://localhost:9200/imdb/movie/_search?q=(scarface+O R+star)+AND+year:[1981+TO+1984]
  37. 37. 37 REST API - Search • Search Types: • http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+A ND+year:[1941+TO+1984]&search_type=count • http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+A ND+year:[1941+TO+1984]&search_type=query_then_fetch • Query and Fetch (fastest): • Executes on all shards and return results • Query then Fetch (default): • Executes on all shards. Only some information returned for rank/sort, only the relevant shards are asked for data
  38. 38. 38 REST API – Query DSL http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+AND+year:[1981+TO+1984] Becomes… curl -XPOST 'localhost:9200/_search?pretty' -d '{ "query" : { "bool" : { "must" : [ { "query_string" : { "query" : "scarface or star" } }, { "range" : { "year" : { "gte" : 1931 } } } ] } } }'
  39. 39. 39 REST API – Query DSL • Query String Request use Lucene query syntax • Limited • Instead use “match” query curl -XPOST 'localhost:9200/_search?pretty' -d '{ "query" : { "bool" : { "must" : [ Automatically builds { a boolean query “match" : { “message" : “scarface star" } }, { "range" : { “year" : { "gte" : 1981 } } } ] …
  40. 40. 40 REST API – Query DSL • Match Query { “match”:{ “title”:{ “type”:“phrase”, “query”:“quick fox”, “slop”:1 } } } • Boolean Query • Must: document must match query • Must_not: document must not match query • Should: document doesn’t have to match • If it matches… higher score { "bool":{ "must":[ { "match":{ "color":"blue" } }, { "match":{ "title":"shirt" } } ], "must_not":[ { "match":{ "size":"xxl" } } ], "should":[ { "match":{ "textile":"cotton" }
  41. 41. 41 REST API – Query DSL • Range Query • Numeric / Date Types • Prefix/Wildcard Query • Match on partial terms • RegExp Query { "range":{ "founded_year":{ "gte":1990, "lt":2000 } } }
  42. 42. 42 REST API – Query DSL • Geo_bbox • Bounding box filter • Geo_distance • Geo_distance_range { "query":{ "filtered":{ "query":{ "match_all":{ } }, "filter":{ "geo_bbox":{ "location":{ "top_left":{ "lat":40.73, "lon":-74.1 }, "bottom_right":{ "lat":40.717, "lon":-73.99 } { "query":{ "filtered":{ "query":{ "match_all":{ } }, "filter":{ "geo_distance":{ "distance":"400km" "location":{ "lat":40.73, "lon":-74.1 } } …
  43. 43. 43 REST API – Bulk Operations • Bulk API • Minimize round trips with index/delete ops • Individual response for every request action • In order • Failure of one action will not stop subsequent actions. • localhost:9200/_bulk { "delete" : { "_index" : “imdb", "_type" : “movie", "_id" : "2" } }n { "index" : { "_index" : “imdb", "_type" : “actor", "_id" : "1" } }n { "first_name" : "Tony", "last_name" : "Soprano" }n ... { “update" : { "_index" : “imdb", "_type" : “movie", "_id" : "3" } }n { doc : {“title" : “Blade Runner" } }n
  44. 44. 44 Percolate API • Reversing Search • Store queries and filter (percolate) documents through them. • Useful for Alert/Monitoring systems curl -XPUT localhost:9200/_percolator/stocks/alert-on-nokia -d '{ "query" : { "boolean" : { "must" : [ { "term" : { "company" : "NOK" }}, { "range" : { "value" : { "lt" : "2.5" }}} ] } } }' curl -X PUT localhost:9200/stocks/stock/1?percolate=* -d '{ "doc" : { "company" : "NOK", "value" : 2.4 } }'
  45. 45. 45 Clients • Client list: http://www.elasticsearch.org/guide/clients/ • Java Client, JS, PHP, Perl, Python, Ruby • Spring Data: • Uses TransportClient • Implementation of ElasticsearchRepository aligns with generic Repository interfaces. • ElasticSearchCrudRepository extends PagingandSortingRepository • https://github.com/spring-projects/spring-data-elasticsearch @Document(indexName = "book", type = "book", indexStoreType = "memory", shards = 1, replicas = 0, refreshInterval = "-1") public class Book { … } public interface ElasticSearchBookRepository extends ElasticsearchRepository<Book, String> { }
  46. 46. 46 B’what about Mongo? • Mongo: • General purpose DB • ElasticSearch: • Distributed text search engine … that’s all I have to say about that.
  47. 47. 47 Questions?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×