Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Elasticsearch:
first steps with an
Aggregate-oriented
database
Jug Roma
28/11/2013
Matteo Moci
Me
Matteo Moci
@matteomoci
http://mox.fm
Software Engineer
R&D, new product development
Agenda
• 2 Use cases
• Elasticsearch Basics
• Data Design for scaling
Social Media Analytics Platform
for Marketing Agencies
Scenario

• Using Elasticsearch as:
• Analytics engine
Aggregate repository
•
Use case 1

• count values distribution over
time
Before

• ~10M documents
• Heaviest query:
~10 minutes
•
• Our staff had a problem
After

• ~10M documents
• Heaviest query:
~1 second (also with larger
•
dataset)
Use case 2
• Aggregate-oriented repository
• ...as in DDD

http://ptgmedia.pearsoncmg.com/images/chap10_9780321834577/elem...
Elasticsearch
Distributed RESTful search and analytics
real time data and analytics
distributed
high availability
multi te...
Elasticsearch basics
• Install
• API
• Types mapping
• Facets
• Relations
Install
$ wget https://
download.elasticsearch.org/...
$ tar -xf
elasticsearch-0.90.7.tar.gz
Run!
Run!
$ ./elasticsearch-0.90.7/bin/
elasticsearch -f

es
Hulk
Run!
$ ./elasticsearch-0.90.7/bin/
elasticsearch -f
$ ./elasticsearch-0.90.7/bin/
elasticsearch -f
es
Hulk
Run!
$ ./elasticsearch-0.90.7/bin/
elasticsearch -f
$ ./elasticsearch-0.90.7/bin/
elasticsearch -f
es
Hulk

Thor
Index a document
$ curl -X PUT localhost:9200/
products/product/1 -d '{
"name" : "Camera"
}'
Search
$ curl	‐X	GET 'localhost:9200/
products/product/_search?
q=Camera'
Shards and Replicas
es
Hulk
Products
1

2

1

2
Shards and Replicas
es
Hulk
Products

Thor

1

2

1

2
Shards and Replicas
es
Hulk
Products

Thor
Products

1

2

1

2
Shards and Replicas
es
Hulk
Products

Thor
Products
2

1
1

2
Shards and Replicas
es
Hulk
Products

Thor
Products
2

1
2

1
Integration

Hulk

Thor
9300

9300
Integration
TransportClient

Hulk

Thor
9300

9300
Async Java API
this.client.prepareGet("documents", "document", id)
//async, non blocking APIs
//use a listener to handle r...
Mapping
Mappings define how primitive
types are stored and analyzed
Mapping
• JSON data is parsed on indexing
• Mapping is done on first field indexing
• Inferred if not configured (!)
• Types:...
"text": {
"type": "multi_field",
"fields": {
"text": {
"type": "string",
"index": "analyzed",
"index_analyzer": "whitespac...
Mapping - lessons
• schema can evolve (e.g. add fields)
• inferred if not specified (!)
• worst case: reindex
• use aliases ...
Search with Facets
final TermsFacetBuilder userFacet =
FacetBuilders.termsFacet(MENTION_FACET_NAME)
.field(USER_ID).size(m...
Query

Facets
Date Histogram Facet
The histogram facet works with numeric data by
building a histogram across intervals of the field valu...
{
 
 
 
 
 
 
 
 
 
 
 
}

 
 
 
 
 
 
 
 
 
 
 

"query" : {
    "match_all" : {}
},
"facets" : {
    "histo1" : {
      ...
Facets - lessons
•

•
•

Bug in 0.90.x:
https://github.com/elasticsearch/elasticsearch/
issues/1305*
Solutions:
use 1 shar...
Analyzers
A Lucene analyzer consists of a tokenizer and
an arbitrary amount of filters (+ char filters)
{
"index":{
"analysis":{
"filter":{
"bigram_shingle_filter":{
"type":"shingle",
"max_shingle_size":2,
"min_shingle_size":2...
Relations between
Documents
Author

1

N

Book

• nested: faster reads, update needs reindex, cross object

match
• parent...
Nested Documents
Specify Book type is “nested” in Author’s Mapping
We can query Authors with a query on properties
of nest...
curl -XGET localhost:9200/authors/nested_author/
_search -d '
{
"query": {
"filtered": {
"query": {"match_all": {}},
"filt...
Parent and Child
Indexing happens separately
Specify _parent type in Child mapping (Book)
When indexing Books, specify id ...
curl -XPOST localhost:9200/authors/book/_mapping -d
'{
"book":{
"_parent": {"type": "bare_author"}
}
}'

curl -XPOST local...
Parent and Child query
curl -XPOST localhost:9200/authors/bare_author/
_search -d '{
"query": {
"has_child": {
"type": "bo...
Data Design
Index Configurations
• One index “per user”
• Single index
• SI + Routing: 1 index + custom doc routing
•

to s...
One Index per user
Hulk

Thor

User1 s0

User1 s1

User2 s0

+ different sharding per user
- small users own (and cost) at...
Single Index
Hulk

Thor

Users s0

Users s3

Users s2

+ filter by user id, support growth
- search hits all shards
Single Index + routing
Hulk

Thor

Users s0

Users s3

Users s2

+ a user’s data is all in one shard,
allows large overall...
Index per time range
Hulk

Thor

2013_01 s1

2013_01 s2

2013_02 s1

+ allows change in future indices
Data Design - lessons
Test, test, test your use case!
Take a single node with one shard and
throw load at it, checking the...
...ES has lots of other
features!
• Bulk operations
• Percolator (alerts, classification, …)
• Suggesters (“Did you mean …?...
Thanks!
@matteomoci
http://mox.fm
Upcoming SlideShare
Loading in …5
×

Elasticsearch first-steps

3,324 views

Published on

Elasticsearch: first steps with an aggregate-oriented database

Published in: Technology
  • DOWNLOAD FULL. BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Elasticsearch first-steps

  1. 1. Elasticsearch: first steps with an Aggregate-oriented database Jug Roma 28/11/2013 Matteo Moci
  2. 2. Me Matteo Moci @matteomoci http://mox.fm Software Engineer R&D, new product development
  3. 3. Agenda • 2 Use cases • Elasticsearch Basics • Data Design for scaling
  4. 4. Social Media Analytics Platform for Marketing Agencies
  5. 5. Scenario • Using Elasticsearch as: • Analytics engine Aggregate repository •
  6. 6. Use case 1 • count values distribution over time
  7. 7. Before • ~10M documents • Heaviest query: ~10 minutes • • Our staff had a problem
  8. 8. After • ~10M documents • Heaviest query: ~1 second (also with larger • dataset)
  9. 9. Use case 2 • Aggregate-oriented repository • ...as in DDD http://ptgmedia.pearsoncmg.com/images/chap10_9780321834577/elementLinks/10fig05.jpg
  10. 10. Elasticsearch Distributed RESTful search and analytics real time data and analytics distributed high availability multi tenancy full-text search schema free RESTful, JSON API
  11. 11. Elasticsearch basics • Install • API • Types mapping • Facets • Relations
  12. 12. Install $ wget https:// download.elasticsearch.org/... $ tar -xf elasticsearch-0.90.7.tar.gz
  13. 13. Run!
  14. 14. Run! $ ./elasticsearch-0.90.7/bin/ elasticsearch -f es Hulk
  15. 15. Run! $ ./elasticsearch-0.90.7/bin/ elasticsearch -f $ ./elasticsearch-0.90.7/bin/ elasticsearch -f es Hulk
  16. 16. Run! $ ./elasticsearch-0.90.7/bin/ elasticsearch -f $ ./elasticsearch-0.90.7/bin/ elasticsearch -f es Hulk Thor
  17. 17. Index a document $ curl -X PUT localhost:9200/ products/product/1 -d '{ "name" : "Camera" }'
  18. 18. Search $ curl ‐X GET 'localhost:9200/ products/product/_search? q=Camera'
  19. 19. Shards and Replicas es Hulk Products 1 2 1 2
  20. 20. Shards and Replicas es Hulk Products Thor 1 2 1 2
  21. 21. Shards and Replicas es Hulk Products Thor Products 1 2 1 2
  22. 22. Shards and Replicas es Hulk Products Thor Products 2 1 1 2
  23. 23. Shards and Replicas es Hulk Products Thor Products 2 1 2 1
  24. 24. Integration Hulk Thor 9300 9300
  25. 25. Integration TransportClient Hulk Thor 9300 9300
  26. 26. Async Java API this.client.prepareGet("documents", "document", id) //async, non blocking APIs //use a listener to handle result. non-blocking .execute(new ActionListener<GetResponse>() { @Override public void onResponse(GetResponse getFields) { // } @Override public void onFailure(Throwable e) { // }
  27. 27. Mapping Mappings define how primitive types are stored and analyzed
  28. 28. Mapping • JSON data is parsed on indexing • Mapping is done on first field indexing • Inferred if not configured (!) • Types: float, long, boolean, date (+formatting), object, nested • String type can have arbitrary analyzers • Fields can be split up in more fields
  29. 29. "text": { "type": "multi_field", "fields": { "text": { "type": "string", "index": "analyzed", "index_analyzer": "whitespace", "analyzer": "whitespace" }, "text_bigram": { "type": "string", "index": "analyzed", "index_analyzer": "bigram_analyzer", "search_analyzer": "bigram_analyzer" }, "text_trigram": { "type": "string", "index": "analyzed", "index_analyzer": "trigram_analyzer", "search_analyzer": "trigram_analyzer"
  30. 30. Mapping - lessons • schema can evolve (e.g. add fields) • inferred if not specified (!) • worst case: reindex • use aliases to enable zero downtime
  31. 31. Search with Facets final TermsFacetBuilder userFacet = FacetBuilders.termsFacet(MENTION_FACET_NAME) .field(USER_ID).size(maxUsersAmount); SearchResponse response; response = client.prepareSearch(Indices.USERS) .setTypes(USER_TYPE) .setQuery(someQuery).setSize(0) .setSearchType(SearchType.COUNT) .addFacet(userFacet).execute().actionGet() ; final TermsFacet facets = (TermsFacet) response.getFacets().facetsAsMap() .get(MENTION_FACET_NAME);
  32. 32. Query Facets
  33. 33. Date Histogram Facet The histogram facet works with numeric data by building a histogram across intervals of the field values. Each value is placed in a “bucket”
  34. 34. {                       }                       "query" : {     "match_all" : {} }, "facets" : {     "histo1" : {         "histogram" : {             "field" : "followers",             "interval" : 10         }     } }
  35. 35. Facets - lessons • • • Bug in 0.90.x: https://github.com/elasticsearch/elasticsearch/ issues/1305* Solutions: use 1 shard ask for top 100 instead of 10 *will be solved in 1.0 with aggregation module
  36. 36. Analyzers A Lucene analyzer consists of a tokenizer and an arbitrary amount of filters (+ char filters)
  37. 37. { "index":{ "analysis":{ "filter":{ "bigram_shingle_filter":{ "type":"shingle", "max_shingle_size":2, "min_shingle_size":2, ... "analyzer":{ "bigram_analyzer":{ "tokenizer":"whitespace", "filter":[ "standard", "bigram_shingle_filter" ] }, "trigram_analyzer":{ "tokenizer":"whitespace", "filter":[ "standard", "trigram_shingle_filter" ] } "output_unigrams":"false", "output_unigrams_if_no_shingles":"fal se" }, "trigram_shingle_filter": { "type":"shingle", "max_shingle_size":3, "min_shingle_size":3, } } "output_unigrams":"false", "output_unigrams_if_no_shingles":"fal se" } } ... } }
  38. 38. Relations between Documents Author 1 N Book • nested: faster reads, update needs reindex, cross object match • parent/child: same shard, no reindex on update, difficult sorting
  39. 39. Nested Documents Specify Book type is “nested” in Author’s Mapping We can query Authors with a query on properties of nested Books “Authors who published at least a book with Penguin, in scifi genre”
  40. 40. curl -XGET localhost:9200/authors/nested_author/ _search -d ' { "query": { "filtered": { "query": {"match_all": {}}, "filter": { "nested": { "path": "books", "query":{ "filtered": { "query": { "match_all": {}}, "filter": { "and": [ {"term": {"books.publisher": "penguin"}}, {"term": {"books.genre": "scifi"}} ]
  41. 41. Parent and Child Indexing happens separately Specify _parent type in Child mapping (Book) When indexing Books, specify id of Author
  42. 42. curl -XPOST localhost:9200/authors/book/_mapping -d '{ "book":{ "_parent": {"type": "bare_author"} } }' curl -XPOST localhost:9200/authors/book/1?parent=2 -d '{ "name": "Revelation Space", "genre": "scifi", "publisher": "penguin" }'
  43. 43. Parent and Child query curl -XPOST localhost:9200/authors/bare_author/ _search -d '{ "query": { "has_child": { "type": "book", "query" : { "filtered": { "query": { "match_all": {}}, "filter" : { "and": [ {"term": {"publisher": "penguin"}}, {"term": {"genre": "scifi"}} ]
  44. 44. Data Design Index Configurations • One index “per user” • Single index • SI + Routing: 1 index + custom doc routing • to shards Time: 1 index per time window * * we can search across indices
  45. 45. One Index per user Hulk Thor User1 s0 User1 s1 User2 s0 + different sharding per user - small users own (and cost) at least 1 shard
  46. 46. Single Index Hulk Thor Users s0 Users s3 Users s2 + filter by user id, support growth - search hits all shards
  47. 47. Single Index + routing Hulk Thor Users s0 Users s3 Users s2 + a user’s data is all in one shard, allows large overallocation
  48. 48. Index per time range Hulk Thor 2013_01 s1 2013_01 s2 2013_02 s1 + allows change in future indices
  49. 49. Data Design - lessons Test, test, test your use case! Take a single node with one shard and throw load at it, checking the shard capacity The shard is the scaling unit: overallocate to enable future scaling #shards > #nodes
  50. 50. ...ES has lots of other features! • Bulk operations • Percolator (alerts, classification, …) • Suggesters (“Did you mean …?”) • Index templates (Automatic index • • • configuration) Monitoring API (Amount of memory used, number of operations, …) Plugins ...
  51. 51. Thanks! @matteomoci http://mox.fm

×