SlideShare a Scribd company logo
INTRODUCTION TO
ELASTICSEARCH
Agenda
• Me
• ElasticSearch Basics
• Concepts
• Network / Discovery
• Data Structure
• Inverted Index
• The REST API
• Bulk API
• Percolator
• Java Integration
• Stuff I didn’t cover
2
Me
3
• Roy Russo
• Former JBoss Portal Co-Founder
• LoopFuse Co-Founder
• ElasticHQ Founder
• http://www.elastichq.org
The Basics
• Document - Oriented Search Engine
• JSON, Lucene
• No Schema
• Mapping Types
• Horizontal Scale, Distributed
• REST API
• Vibrant Ecosystem
• Apps, Plugins, Hosting
4
The Basics - Distro
• Download and Run
5
├── bin
│ ├── elasticsearch
│ ├── elasticsearch.in.sh
│ └── plugin
├── config
│ ├── elasticsearch.yml
│ └── logging.yml
├── data
│ └── cluster1
├── lib
│ ├── elasticsearch-x.y.z.jar
│ ├── ...
│ └──
└── logs
├── elasticsearch.log
└── elasticsearch_index_search_slowlog.log
└── elasticsearch_index_indexing_slowlog.log
Executables
Log files
Node Configs
Data Storage
The Basics - Glossary
• Node = One ElasticSearch instance (1 java proc)
• Cluster = 1..N Nodes w/ same Cluster Name
• Index = Similar to a DB
• Named Collection of Documents
• Maps to 1..N Primary shards && 0..N Replica shards
• Type = Similar to a DB Table
• Document Definition
• Shard = One Lucene instance
• Distributed across all nodes in the cluster.
6
The Basics - Document Structure
• Modeled as a JSON object
7
{
"genre": "Crime",
“language": "English",
"country": "USA",
"runtime": 170,
"title": "Scarface",
"year": 1983
}
{
"_index": "imdb",
"_type": "movie",
"_id": "u17o8zy9RcKg6SjQZqQ4Ow",
"_version": 1,
"exists": true,
"_source": {
"genre": "Crime",
"language": "English",
"country": "USA",
"runtime": 170,
"title": "Scarface",
"year": 1983
}
}
The Basics - Document Structure
• Document Metadata fields
• _id
• _type : mapping type
• _source : enabled/disabled
• _timestamp
• _ttl
• _size : size of uncompressed _source
• _version
8
The Basics - Document Structure
• Mapping:
• ES will auto-map fields
• You can specify mapping, if needed
• Data Types:
• String
• Analyzers, Tokenizers, Filters
• Number
• Int, long, float, double, short, byte
• Boolean
• Datetime
• formatted
• geo_point, geo_shape, ip
• Attachment (requires plugin)
9
Lucene – Inverted Index
• Which presidential speeches contain the words “fair”
• Go over every speech, word by word, and mark the speeches that
contain it
• Linear to number of words
• Fails at large scale
10
Lucene – Inverted Index
• Inverting Obama
• Take all the speeches
• Break them down by word (tokenize)
• For each word, store the IDs of the speeches
• Sort all words (tokens)
• Searching
• Finding the word is fast
• Iterate over document IDs that are referenced
11
Token Doc Frequency Doc IDs
Jobs 2 4,8
Fair 5 1,2,4,8,42
Bush 300 1,2,3,4,5,6, …
Lucene – Inverted Index
• Not an algorithm
• Implementations vary
12
Cluster Topology
• 4 Node Cluster
• Index Configuration:
• “A”: 2 Shards, 1 Replica
• “B”: 3 Shards, 1 Replica
13
A1 A2
B2 B2 B1
B3
B1 A1 A2
B3
The Basics - Shards
• Paths…
• Primary Shard:
• First time Indexing
• Index has 1..N primary shards (default: 5)
• # Not changeable once index created
• Replica Shard:
• Copy of the primary shard
• Can be changed later
• Each primary has 0..N replicas
• HA:
• Promoted to primary if primary fails
• Get/Search handled by primary||replica
14
The Basics - Shards
• Shard Stages
• UNASSIGNED
• INITIALIZING
• STARTED
• RELOCATING
• Viewed in Cluster State
• Routing table : from indices perspective
• Routing nodes
15
The Basics - Searching
• How it works:
• Search request hits a node
• Node broadcasts to every shard in the index (primary & replica)
• Each shard performs query
• Each shard returns results
• Results merged, sorted, and returned to client.
• Problems:
• ES has no idea where your document is
• Randomly distributed around cluster
• Broadcast query to 100 nodes
• Performance degrades
16
The Basics - Shards
• Shard Allocation Awareness
• cluster.routing.allocation.awareness.attributes: rack_id
• Example:
• 2 Nodes with node.rack_id=rack_one
• Create Index 5 shards / 1 replica (10 shards)
• Add 2 Nodes with node.rack_id=rack_two
• Shards RELOCATE to even distribution
• Primary & Replica will NOT be on the same rack_id value.
• Shard Allocation Filtering
• node.tag=val1
• index.routing.allocation.include.tag:val1,val2
17
curl -XPUT localhost:9200/newIndex/_settings -d '{
"index.routing.allocation.include.tag" : "val1,val2"
}'
The Basics - Routing
18
curl -XPOST localhost:9200/crunchbase/person/1?routing=xerox -d '{
...
}'
curl -XPOST localhost:9200/crunchbase/_search?routing=xerox -d '{
"query" : {
"filtered" : {
"query" : { ... },}
}
}'
Routing can be used to constrain the shards being
searched on, otherwise, all shards will be hit
Discovery
• Nodes discover each other using multicast.
• Unicast is an option
• Each cluster has an elected master node
• Beware of split-brain
• discovery.zen.minimum_master_nodes
• N/2+1, where N>2
• N: # of master nodes in cluster
19
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["host1", "host2:port", "host3"]
Nodes
• Master node handles cluster-wide (Meta-API) events:
• Node participation
• New indices create/delete
• Re-Allocation of shards
• Data Nodes
• Indexing / Searching operations
• Client Nodes
• REST calls
• Light-weight load balancers
• Beware of Heap Size
• ES_HEAP_SIZE: ~1/2 machine memory
20
Cluster State
• Cluster State
• Node Membership
• Indices Settings and Mappings (Types)
• Shard Allocation Table
• Shard State
• cURL -XGET http://localhost:9200/_cluster/state?pretty=1'
21
Cluster State
• Changes in State published from Master to other nodes
22
PUT /newIndex
2 3
1 (M)
CS1 CS1 CS1
2 3
1 (M)
CS2 CS1 CS1
2 3
1 (M)
CS2 CS2 CS2
REST API
23
• Building IMDB
• Two Indexes
REST API
• Create Index
• action.auto_create_index: 0
• Index Document
• Dynamic type mapping
• Versioning
• ID specification
• Parent / Child (/1122?parent=1111)
• Explicit Refresh (?refresh=1)
• Timeout flag (?timeout=5m)
24
REST API – Versioning
• Every document is Versioned
• Version assigned on creation
• Version number can be assigned
• Re-Index, Update, and Delete update Version
25
REST API - Update
• Update using partial data
• Partial doc merged with existing
• Fails if document doesn’t exist
• “Upsert” data used to create a doc, if doesn’t exist
26
{
“upsert" : {
“title": “Blade Runner”
}
}
REST API
• Exists
• No overhead in loading
• Status Code Result
• Delete
• Get
• Multi-Get
27
{
"docs" : [
{
"_id" : "1"
"_index" : "imdb"
"_type" : "movie"
},
{
"_id" : "5"
"_index" : "oldmovies"
"_type" : "movie"
"_fields" " ["title", "genre"]
}
]
}
REST API - Search
• Free Text Search
• URL Request
• http://localhost:9200/imdb/movie/_search?q=scar*
• Complex Query
• http://localhost:9200/imdb/movie/_search?q=scarface+OR
+star
• http://localhost:9200/imdb/movie/_search?q=(scarface+O
R+star)+AND+year:[1981+TO+1984]
• Term, Boolean, range, fuzzy, etc…
28
REST API - Search
• Search Types:
• http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+A
ND+year:[1981+TO+1984]&search_type=count
• http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+A
ND+year:[1981+TO+1984]&search_type=query_then_fetch
• Query and Fetch:
• Executes on all shards and return results
• Query then Fetch:
• Executes on all shards. Only some information returned for rank/sort,
only the relevant shards are asked for data
29
REST API – Query DSL
30
http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+AND+year:[1981+TO+1984]
curl -XPOST 'localhost:9200/_search?pretty' -d '{
"query" : {
"bool" : {
"must" : [
{
"query_string" : {
"query" : “scarface or star"
}
},
{
"range" : {
“year" : { "gte" : 1981 }
}
}
]
}
}
}'
Becomes…
REST API – Query DSL
• Query String Request use Lucene query syntax
• Limited
• Error-prone
• Instead use “match” query
31
curl -XPOST 'localhost:9200/_search?pretty' -d '{
"query" : {
"bool" : {
"must" : [
{
“match" : {
“message" : “scarface star"
}
},
{
"range" : {
“year" : { "gte" : 1981 }
}
}
]
…
Automatically builds
a boolean query
REST API – Query DSL
• Match Query
• Boolean Query
• Must: document must match query
• Must_not: document must not match query
• Should: document doesn’t have to match
• If it matches… higher score
• Compound queries
32
{
"bool":{
"must":[
{
"match":{
"color":"blue"
}
},
{
"match":{
"title":"shirt"
}
}
],
"must_not":[
{
"match":{
"size":"xxl"
}
}
],
"should":[
{
"match":{
"textile":"cotton"
}
{
“match”:{
“title”:{
“type”:“phrase”,
“query”:“quick fox”,
“slop”:1
}
}
}
REST API – Query DSL
• Range Query
• Numeric / Date Types
• Prefix/Wildcard Query
• Match on partial terms
• Fuzzy Query
• Similar looking text matched.
• RegExp Query
33
{
"range":{
"founded_year":{
"gte":1990,
"lt":2000
}
}
}
REST API – Query DSL
• Geo_bbox
• Bounding box filter
• Geo_distance
• Geo_distance_range
34
{
"query":{
"filtered":{
"query":{
"match_all":{
}
},
"filter":{
"geo_bbox":{
"location":{
"top_left":{
"lat":40.73,
"lon":-74.1
},
"bottom_right":{
"lat":40.717,
"lon":-73.99
}
…
{
"query":{
"filtered":{
"query":{
"match_all":{
}
},
"filter":{
"geo_distance":{
"distance":"400km"
"location":{
"lat":40.73,
"lon":-74.1
}
}
REST API – Bulk Operations
• Bulk API
• Minimize round trips with index/delete ops
• Individual response for every request action
• In order
• Failure of one action will not stop subsequent actions.
• localhost:9200/_bulk
• No pretty-printing. Use n
35
{ "delete" : { "_index" : “imdb", "_type" : “movie", "_id" : "2" } }n
{ "index" : { "_index" : “imdb", "_type" : “actor", "_id" : "1" } }n
{ "first_name" : "Tony", "last_name" : "Soprano" }n
...
{ “update" : { "_index" : “imdb", "_type" : “movie", "_id" : "3" } }n
{ doc : {“title" : “Blade Runner" } }n
Percolate API
• Reversing Search
• Store queries and filter (percolate) documents through them.
36
curl -XPUT localhost:9200/_percolator/stocks/alert-on-nokia -d '{
"query" : {
"boolean" : {
"must" : [
{ "term" : { "company" : "NOK" }},
{ "range" : { "value" : { "lt" : "2.5" }}}
]
}
}
}'
curl -X GET localhost:9200/stocks/stock/_percolate -d '{
"doc" : {
"company" : "NOK",
"value" : 2.4
}
}'
Java Integration
• Client list: http://www.elasticsearch.org/guide/clients/
• Java Client, JEST
• Limited
• https://github.com/searchbox-io/Jest
• Spring Data:
• Uses TransportClient
• Implementation of ElasticsearchRepository aligns with generic
Repository interfaces.
• ElasticSearchCrudRepository extends PagingandSortingRepository
• https://github.com/spring-projects/spring-data-elasticsearch
37
@Document(indexName = "book", type = "book", indexStoreType = "memory", shards = 1, replicas = 0, refreshInterval = "-1")
public class Book {
…
}
public interface ElasticSearchBookRepository extends ElasticsearchRepository<Book, String> {
}
Stuff I didn’t cover…
• Analyzers
• Tokenizers
• Token Filters
• Rivers
• RabbitMQ, MySQL, JDBC
38
$ curl -XGET
'localhost:9200/_analyze?tokenizer=whitespace&filters=lowercase
,stop&pretty=1' -d '
The quick Fox Jumped
'
{
"tokens" : [ {
"token" : "quick",
"start_offset" : 5,
"end_offset" : 10,
"type" : "word",
"position" : 2
}, {
"token" : "fox",
"start_offset" : 11,
"end_offset" : 14,
"type" : "word",
"position" : 3
}, {
"token" : "jumped",
"start_offset" : 15,
"end_offset" : 21,
"type" : "word",
"position" : 4
} ]
}
B’what about Mongo?
• Mongo:
• General purpose DB
• ElasticSearch:
• Distributed text search engine
… that’s all I have to say about that.
39
Questions?
40

More Related Content

What's hot

Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
Shifa Khan
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with Elasticsearch
Samantha Quiñones
 
Dcm#8 elastic search
Dcm#8  elastic searchDcm#8  elastic search
Dcm#8 elastic search
Ivan Wallarm
 
Simple search with elastic search
Simple search with elastic searchSimple search with elastic search
Simple search with elastic search
markstory
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
Joey Wen
 
You know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msYou know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900ms
Jodok Batlogg
 
Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)
Federico Panini
 
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Karel Minarik
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
Charlie Hull
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
Jurriaan Persyn
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
Anurag Patel
 
Building a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearchBuilding a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearch
Mark Greene
 
Query DSL In Elasticsearch
Query DSL In ElasticsearchQuery DSL In Elasticsearch
Query DSL In Elasticsearch
Knoldus Inc.
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hood
SmartCat
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
Philips Kokoh Prasetyo
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuning
Petar Djekic
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
David Pilato
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Sematext Group, Inc.
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1
Maruf Hassan
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
Navule Rao
 

What's hot (20)

Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with Elasticsearch
 
Dcm#8 elastic search
Dcm#8  elastic searchDcm#8  elastic search
Dcm#8 elastic search
 
Simple search with elastic search
Simple search with elastic searchSimple search with elastic search
Simple search with elastic search
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
 
You know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msYou know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900ms
 
Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)
 
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
 
Building a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearchBuilding a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearch
 
Query DSL In Elasticsearch
Query DSL In ElasticsearchQuery DSL In Elasticsearch
Query DSL In Elasticsearch
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hood
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuning
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 

Viewers also liked

How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life琛琳 饶
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaAmazee Labs
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
Prajal Kulkarni
 
WordPress 2017 with VueJS and GraphQL
WordPress 2017 with VueJS and GraphQLWordPress 2017 with VueJS and GraphQL
WordPress 2017 with VueJS and GraphQL
houzman
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
Henry Jeong
 
Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용
종민 김
 
Monitoring and Log Management for
Monitoring and Log Management forMonitoring and Log Management for
Monitoring and Log Management for
Sematext Group, Inc.
 
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
Airat Khisamov
 
Vue 2.0 + Vuex Router & Vuex at Vue.js
Vue 2.0 + Vuex Router & Vuex at Vue.jsVue 2.0 + Vuex Router & Vuex at Vue.js
Vue 2.0 + Vuex Router & Vuex at Vue.js
Takuya Tejima
 
Logstash
LogstashLogstash
Logstash
琛琳 饶
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.com
Renzo Tomà
 
elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리
Junyi Song
 
Elk stack
Elk stackElk stack
Elk stack
Jilles van Gurp
 
Cyber Security 101: Training, awareness, strategies for small to medium sized...
Cyber Security 101: Training, awareness, strategies for small to medium sized...Cyber Security 101: Training, awareness, strategies for small to medium sized...
Cyber Security 101: Training, awareness, strategies for small to medium sized...
Stephen Cobb
 
Vue, vue router, vuex
Vue, vue router, vuexVue, vue router, vuex
Vue, vue router, vuex
Samundra khatri
 

Viewers also liked (15)

How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
 
WordPress 2017 with VueJS and GraphQL
WordPress 2017 with VueJS and GraphQLWordPress 2017 with VueJS and GraphQL
WordPress 2017 with VueJS and GraphQL
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
 
Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용
 
Monitoring and Log Management for
Monitoring and Log Management forMonitoring and Log Management for
Monitoring and Log Management for
 
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
 
Vue 2.0 + Vuex Router & Vuex at Vue.js
Vue 2.0 + Vuex Router & Vuex at Vue.jsVue 2.0 + Vuex Router & Vuex at Vue.js
Vue 2.0 + Vuex Router & Vuex at Vue.js
 
Logstash
LogstashLogstash
Logstash
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.com
 
elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리
 
Elk stack
Elk stackElk stack
Elk stack
 
Cyber Security 101: Training, awareness, strategies for small to medium sized...
Cyber Security 101: Training, awareness, strategies for small to medium sized...Cyber Security 101: Training, awareness, strategies for small to medium sized...
Cyber Security 101: Training, awareness, strategies for small to medium sized...
 
Vue, vue router, vuex
Vue, vue router, vuexVue, vue router, vuex
Vue, vue router, vuex
 

Similar to ElasticSearch AJUG 2013

Percona Live London 2014: Serve out any page with an HA Sphinx environment
Percona Live London 2014: Serve out any page with an HA Sphinx environmentPercona Live London 2014: Serve out any page with an HA Sphinx environment
Percona Live London 2014: Serve out any page with an HA Sphinx environment
spil-engineering
 
Elk presentation1#3
Elk presentation1#3Elk presentation1#3
Elk presentation1#3
uzzal basak
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017
Roy Russo
 
Devnexus 2018
Devnexus 2018Devnexus 2018
Devnexus 2018
Roy Russo
 
曾勇 Elastic search-intro
曾勇 Elastic search-intro曾勇 Elastic search-intro
曾勇 Elastic search-intro
Shaoning Pan
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
Daniel N
 
Elastic search intro-@lamper
Elastic search intro-@lamperElastic search intro-@lamper
Elastic search intro-@lamper
medcl
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
IBM Cloud Data Services
 
Couchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedCouchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data Demystified
Omid Vahdaty
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
Minsoo Jun
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nlbartzon
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
Karwin Software Solutions LLC
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsKorea Sdec
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
tieleman
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0
Keshav Murthy
 
ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用
LINE Corporation
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDB
Andrew Siemer
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1
medcl
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
MongoDB
 
ELK stack introduction
ELK stack introduction ELK stack introduction
ELK stack introduction
abenyeung1
 

Similar to ElasticSearch AJUG 2013 (20)

Percona Live London 2014: Serve out any page with an HA Sphinx environment
Percona Live London 2014: Serve out any page with an HA Sphinx environmentPercona Live London 2014: Serve out any page with an HA Sphinx environment
Percona Live London 2014: Serve out any page with an HA Sphinx environment
 
Elk presentation1#3
Elk presentation1#3Elk presentation1#3
Elk presentation1#3
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017
 
Devnexus 2018
Devnexus 2018Devnexus 2018
Devnexus 2018
 
曾勇 Elastic search-intro
曾勇 Elastic search-intro曾勇 Elastic search-intro
曾勇 Elastic search-intro
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
 
Elastic search intro-@lamper
Elastic search intro-@lamperElastic search intro-@lamper
Elastic search intro-@lamper
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
 
Couchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedCouchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data Demystified
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0
 
ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDB
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
ELK stack introduction
ELK stack introduction ELK stack introduction
ELK stack introduction
 

Recently uploaded

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 

Recently uploaded (20)

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 

ElasticSearch AJUG 2013

  • 2. Agenda • Me • ElasticSearch Basics • Concepts • Network / Discovery • Data Structure • Inverted Index • The REST API • Bulk API • Percolator • Java Integration • Stuff I didn’t cover 2
  • 3. Me 3 • Roy Russo • Former JBoss Portal Co-Founder • LoopFuse Co-Founder • ElasticHQ Founder • http://www.elastichq.org
  • 4. The Basics • Document - Oriented Search Engine • JSON, Lucene • No Schema • Mapping Types • Horizontal Scale, Distributed • REST API • Vibrant Ecosystem • Apps, Plugins, Hosting 4
  • 5. The Basics - Distro • Download and Run 5 ├── bin │ ├── elasticsearch │ ├── elasticsearch.in.sh │ └── plugin ├── config │ ├── elasticsearch.yml │ └── logging.yml ├── data │ └── cluster1 ├── lib │ ├── elasticsearch-x.y.z.jar │ ├── ... │ └── └── logs ├── elasticsearch.log └── elasticsearch_index_search_slowlog.log └── elasticsearch_index_indexing_slowlog.log Executables Log files Node Configs Data Storage
  • 6. The Basics - Glossary • Node = One ElasticSearch instance (1 java proc) • Cluster = 1..N Nodes w/ same Cluster Name • Index = Similar to a DB • Named Collection of Documents • Maps to 1..N Primary shards && 0..N Replica shards • Type = Similar to a DB Table • Document Definition • Shard = One Lucene instance • Distributed across all nodes in the cluster. 6
  • 7. The Basics - Document Structure • Modeled as a JSON object 7 { "genre": "Crime", “language": "English", "country": "USA", "runtime": 170, "title": "Scarface", "year": 1983 } { "_index": "imdb", "_type": "movie", "_id": "u17o8zy9RcKg6SjQZqQ4Ow", "_version": 1, "exists": true, "_source": { "genre": "Crime", "language": "English", "country": "USA", "runtime": 170, "title": "Scarface", "year": 1983 } }
  • 8. The Basics - Document Structure • Document Metadata fields • _id • _type : mapping type • _source : enabled/disabled • _timestamp • _ttl • _size : size of uncompressed _source • _version 8
  • 9. The Basics - Document Structure • Mapping: • ES will auto-map fields • You can specify mapping, if needed • Data Types: • String • Analyzers, Tokenizers, Filters • Number • Int, long, float, double, short, byte • Boolean • Datetime • formatted • geo_point, geo_shape, ip • Attachment (requires plugin) 9
  • 10. Lucene – Inverted Index • Which presidential speeches contain the words “fair” • Go over every speech, word by word, and mark the speeches that contain it • Linear to number of words • Fails at large scale 10
  • 11. Lucene – Inverted Index • Inverting Obama • Take all the speeches • Break them down by word (tokenize) • For each word, store the IDs of the speeches • Sort all words (tokens) • Searching • Finding the word is fast • Iterate over document IDs that are referenced 11 Token Doc Frequency Doc IDs Jobs 2 4,8 Fair 5 1,2,4,8,42 Bush 300 1,2,3,4,5,6, …
  • 12. Lucene – Inverted Index • Not an algorithm • Implementations vary 12
  • 13. Cluster Topology • 4 Node Cluster • Index Configuration: • “A”: 2 Shards, 1 Replica • “B”: 3 Shards, 1 Replica 13 A1 A2 B2 B2 B1 B3 B1 A1 A2 B3
  • 14. The Basics - Shards • Paths… • Primary Shard: • First time Indexing • Index has 1..N primary shards (default: 5) • # Not changeable once index created • Replica Shard: • Copy of the primary shard • Can be changed later • Each primary has 0..N replicas • HA: • Promoted to primary if primary fails • Get/Search handled by primary||replica 14
  • 15. The Basics - Shards • Shard Stages • UNASSIGNED • INITIALIZING • STARTED • RELOCATING • Viewed in Cluster State • Routing table : from indices perspective • Routing nodes 15
  • 16. The Basics - Searching • How it works: • Search request hits a node • Node broadcasts to every shard in the index (primary & replica) • Each shard performs query • Each shard returns results • Results merged, sorted, and returned to client. • Problems: • ES has no idea where your document is • Randomly distributed around cluster • Broadcast query to 100 nodes • Performance degrades 16
  • 17. The Basics - Shards • Shard Allocation Awareness • cluster.routing.allocation.awareness.attributes: rack_id • Example: • 2 Nodes with node.rack_id=rack_one • Create Index 5 shards / 1 replica (10 shards) • Add 2 Nodes with node.rack_id=rack_two • Shards RELOCATE to even distribution • Primary & Replica will NOT be on the same rack_id value. • Shard Allocation Filtering • node.tag=val1 • index.routing.allocation.include.tag:val1,val2 17 curl -XPUT localhost:9200/newIndex/_settings -d '{ "index.routing.allocation.include.tag" : "val1,val2" }'
  • 18. The Basics - Routing 18 curl -XPOST localhost:9200/crunchbase/person/1?routing=xerox -d '{ ... }' curl -XPOST localhost:9200/crunchbase/_search?routing=xerox -d '{ "query" : { "filtered" : { "query" : { ... },} } }' Routing can be used to constrain the shards being searched on, otherwise, all shards will be hit
  • 19. Discovery • Nodes discover each other using multicast. • Unicast is an option • Each cluster has an elected master node • Beware of split-brain • discovery.zen.minimum_master_nodes • N/2+1, where N>2 • N: # of master nodes in cluster 19 discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: ["host1", "host2:port", "host3"]
  • 20. Nodes • Master node handles cluster-wide (Meta-API) events: • Node participation • New indices create/delete • Re-Allocation of shards • Data Nodes • Indexing / Searching operations • Client Nodes • REST calls • Light-weight load balancers • Beware of Heap Size • ES_HEAP_SIZE: ~1/2 machine memory 20
  • 21. Cluster State • Cluster State • Node Membership • Indices Settings and Mappings (Types) • Shard Allocation Table • Shard State • cURL -XGET http://localhost:9200/_cluster/state?pretty=1' 21
  • 22. Cluster State • Changes in State published from Master to other nodes 22 PUT /newIndex 2 3 1 (M) CS1 CS1 CS1 2 3 1 (M) CS2 CS1 CS1 2 3 1 (M) CS2 CS2 CS2
  • 23. REST API 23 • Building IMDB • Two Indexes
  • 24. REST API • Create Index • action.auto_create_index: 0 • Index Document • Dynamic type mapping • Versioning • ID specification • Parent / Child (/1122?parent=1111) • Explicit Refresh (?refresh=1) • Timeout flag (?timeout=5m) 24
  • 25. REST API – Versioning • Every document is Versioned • Version assigned on creation • Version number can be assigned • Re-Index, Update, and Delete update Version 25
  • 26. REST API - Update • Update using partial data • Partial doc merged with existing • Fails if document doesn’t exist • “Upsert” data used to create a doc, if doesn’t exist 26 { “upsert" : { “title": “Blade Runner” } }
  • 27. REST API • Exists • No overhead in loading • Status Code Result • Delete • Get • Multi-Get 27 { "docs" : [ { "_id" : "1" "_index" : "imdb" "_type" : "movie" }, { "_id" : "5" "_index" : "oldmovies" "_type" : "movie" "_fields" " ["title", "genre"] } ] }
  • 28. REST API - Search • Free Text Search • URL Request • http://localhost:9200/imdb/movie/_search?q=scar* • Complex Query • http://localhost:9200/imdb/movie/_search?q=scarface+OR +star • http://localhost:9200/imdb/movie/_search?q=(scarface+O R+star)+AND+year:[1981+TO+1984] • Term, Boolean, range, fuzzy, etc… 28
  • 29. REST API - Search • Search Types: • http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+A ND+year:[1981+TO+1984]&search_type=count • http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+A ND+year:[1981+TO+1984]&search_type=query_then_fetch • Query and Fetch: • Executes on all shards and return results • Query then Fetch: • Executes on all shards. Only some information returned for rank/sort, only the relevant shards are asked for data 29
  • 30. REST API – Query DSL 30 http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+AND+year:[1981+TO+1984] curl -XPOST 'localhost:9200/_search?pretty' -d '{ "query" : { "bool" : { "must" : [ { "query_string" : { "query" : “scarface or star" } }, { "range" : { “year" : { "gte" : 1981 } } } ] } } }' Becomes…
  • 31. REST API – Query DSL • Query String Request use Lucene query syntax • Limited • Error-prone • Instead use “match” query 31 curl -XPOST 'localhost:9200/_search?pretty' -d '{ "query" : { "bool" : { "must" : [ { “match" : { “message" : “scarface star" } }, { "range" : { “year" : { "gte" : 1981 } } } ] … Automatically builds a boolean query
  • 32. REST API – Query DSL • Match Query • Boolean Query • Must: document must match query • Must_not: document must not match query • Should: document doesn’t have to match • If it matches… higher score • Compound queries 32 { "bool":{ "must":[ { "match":{ "color":"blue" } }, { "match":{ "title":"shirt" } } ], "must_not":[ { "match":{ "size":"xxl" } } ], "should":[ { "match":{ "textile":"cotton" } { “match”:{ “title”:{ “type”:“phrase”, “query”:“quick fox”, “slop”:1 } } }
  • 33. REST API – Query DSL • Range Query • Numeric / Date Types • Prefix/Wildcard Query • Match on partial terms • Fuzzy Query • Similar looking text matched. • RegExp Query 33 { "range":{ "founded_year":{ "gte":1990, "lt":2000 } } }
  • 34. REST API – Query DSL • Geo_bbox • Bounding box filter • Geo_distance • Geo_distance_range 34 { "query":{ "filtered":{ "query":{ "match_all":{ } }, "filter":{ "geo_bbox":{ "location":{ "top_left":{ "lat":40.73, "lon":-74.1 }, "bottom_right":{ "lat":40.717, "lon":-73.99 } … { "query":{ "filtered":{ "query":{ "match_all":{ } }, "filter":{ "geo_distance":{ "distance":"400km" "location":{ "lat":40.73, "lon":-74.1 } }
  • 35. REST API – Bulk Operations • Bulk API • Minimize round trips with index/delete ops • Individual response for every request action • In order • Failure of one action will not stop subsequent actions. • localhost:9200/_bulk • No pretty-printing. Use n 35 { "delete" : { "_index" : “imdb", "_type" : “movie", "_id" : "2" } }n { "index" : { "_index" : “imdb", "_type" : “actor", "_id" : "1" } }n { "first_name" : "Tony", "last_name" : "Soprano" }n ... { “update" : { "_index" : “imdb", "_type" : “movie", "_id" : "3" } }n { doc : {“title" : “Blade Runner" } }n
  • 36. Percolate API • Reversing Search • Store queries and filter (percolate) documents through them. 36 curl -XPUT localhost:9200/_percolator/stocks/alert-on-nokia -d '{ "query" : { "boolean" : { "must" : [ { "term" : { "company" : "NOK" }}, { "range" : { "value" : { "lt" : "2.5" }}} ] } } }' curl -X GET localhost:9200/stocks/stock/_percolate -d '{ "doc" : { "company" : "NOK", "value" : 2.4 } }'
  • 37. Java Integration • Client list: http://www.elasticsearch.org/guide/clients/ • Java Client, JEST • Limited • https://github.com/searchbox-io/Jest • Spring Data: • Uses TransportClient • Implementation of ElasticsearchRepository aligns with generic Repository interfaces. • ElasticSearchCrudRepository extends PagingandSortingRepository • https://github.com/spring-projects/spring-data-elasticsearch 37 @Document(indexName = "book", type = "book", indexStoreType = "memory", shards = 1, replicas = 0, refreshInterval = "-1") public class Book { … } public interface ElasticSearchBookRepository extends ElasticsearchRepository<Book, String> { }
  • 38. Stuff I didn’t cover… • Analyzers • Tokenizers • Token Filters • Rivers • RabbitMQ, MySQL, JDBC 38 $ curl -XGET 'localhost:9200/_analyze?tokenizer=whitespace&filters=lowercase ,stop&pretty=1' -d ' The quick Fox Jumped ' { "tokens" : [ { "token" : "quick", "start_offset" : 5, "end_offset" : 10, "type" : "word", "position" : 2 }, { "token" : "fox", "start_offset" : 11, "end_offset" : 14, "type" : "word", "position" : 3 }, { "token" : "jumped", "start_offset" : 15, "end_offset" : 21, "type" : "word", "position" : 4 } ] }
  • 39. B’what about Mongo? • Mongo: • General purpose DB • ElasticSearch: • Distributed text search engine … that’s all I have to say about that. 39