Elasticsearch first-steps
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Elasticsearch first-steps

on

  • 2,153 views

Elasticsearch: first steps with an aggregate-oriented database

Elasticsearch: first steps with an aggregate-oriented database

Statistics

Views

Total Views
2,153
Views on SlideShare
1,577
Embed Views
576

Actions

Likes
2
Downloads
21
Comments
0

4 Embeds 576

http://jugroma.blogspot.it 516
http://jugroma.ugolandini.com 38
https://twitter.com 20
http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Elasticsearch first-steps Presentation Transcript

  • 1. Elasticsearch: first steps with an Aggregate-oriented database Jug Roma 28/11/2013 Matteo Moci
  • 2. Me Matteo Moci @matteomoci http://mox.fm Software Engineer R&D, new product development
  • 3. Agenda • 2 Use cases • Elasticsearch Basics • Data Design for scaling
  • 4. Social Media Analytics Platform for Marketing Agencies
  • 5. Scenario • Using Elasticsearch as: • Analytics engine Aggregate repository •
  • 6. Use case 1 • count values distribution over time
  • 7. Before • ~10M documents • Heaviest query: ~10 minutes • • Our staff had a problem
  • 8. After • ~10M documents • Heaviest query: ~1 second (also with larger • dataset)
  • 9. Use case 2 • Aggregate-oriented repository • ...as in DDD http://ptgmedia.pearsoncmg.com/images/chap10_9780321834577/elementLinks/10fig05.jpg
  • 10. Elasticsearch Distributed RESTful search and analytics real time data and analytics distributed high availability multi tenancy full-text search schema free RESTful, JSON API
  • 11. Elasticsearch basics • Install • API • Types mapping • Facets • Relations
  • 12. Install $ wget https:// download.elasticsearch.org/... $ tar -xf elasticsearch-0.90.7.tar.gz
  • 13. Run!
  • 14. Run! $ ./elasticsearch-0.90.7/bin/ elasticsearch -f es Hulk
  • 15. Run! $ ./elasticsearch-0.90.7/bin/ elasticsearch -f $ ./elasticsearch-0.90.7/bin/ elasticsearch -f es Hulk
  • 16. Run! $ ./elasticsearch-0.90.7/bin/ elasticsearch -f $ ./elasticsearch-0.90.7/bin/ elasticsearch -f es Hulk Thor
  • 17. Index a document $ curl -X PUT localhost:9200/ products/product/1 -d '{ "name" : "Camera" }'
  • 18. Search $ curl ‐X GET 'localhost:9200/ products/product/_search? q=Camera'
  • 19. Shards and Replicas es Hulk Products 1 2 1 2
  • 20. Shards and Replicas es Hulk Products Thor 1 2 1 2
  • 21. Shards and Replicas es Hulk Products Thor Products 1 2 1 2
  • 22. Shards and Replicas es Hulk Products Thor Products 2 1 1 2
  • 23. Shards and Replicas es Hulk Products Thor Products 2 1 2 1
  • 24. Integration Hulk Thor 9300 9300
  • 25. Integration TransportClient Hulk Thor 9300 9300
  • 26. Async Java API this.client.prepareGet("documents", "document", id) //async, non blocking APIs //use a listener to handle result. non-blocking .execute(new ActionListener<GetResponse>() { @Override public void onResponse(GetResponse getFields) { // } @Override public void onFailure(Throwable e) { // }
  • 27. Mapping Mappings define how primitive types are stored and analyzed
  • 28. Mapping • JSON data is parsed on indexing • Mapping is done on first field indexing • Inferred if not configured (!) • Types: float, long, boolean, date (+formatting), object, nested • String type can have arbitrary analyzers • Fields can be split up in more fields
  • 29. "text": { "type": "multi_field", "fields": { "text": { "type": "string", "index": "analyzed", "index_analyzer": "whitespace", "analyzer": "whitespace" }, "text_bigram": { "type": "string", "index": "analyzed", "index_analyzer": "bigram_analyzer", "search_analyzer": "bigram_analyzer" }, "text_trigram": { "type": "string", "index": "analyzed", "index_analyzer": "trigram_analyzer", "search_analyzer": "trigram_analyzer"
  • 30. Mapping - lessons • schema can evolve (e.g. add fields) • inferred if not specified (!) • worst case: reindex • use aliases to enable zero downtime
  • 31. Search with Facets final TermsFacetBuilder userFacet = FacetBuilders.termsFacet(MENTION_FACET_NAME) .field(USER_ID).size(maxUsersAmount); SearchResponse response; response = client.prepareSearch(Indices.USERS) .setTypes(USER_TYPE) .setQuery(someQuery).setSize(0) .setSearchType(SearchType.COUNT) .addFacet(userFacet).execute().actionGet() ; final TermsFacet facets = (TermsFacet) response.getFacets().facetsAsMap() .get(MENTION_FACET_NAME);
  • 32. Query Facets
  • 33. Date Histogram Facet The histogram facet works with numeric data by building a histogram across intervals of the field values. Each value is placed in a “bucket”
  • 34. {                       }                       "query" : {     "match_all" : {} }, "facets" : {     "histo1" : {         "histogram" : {             "field" : "followers",             "interval" : 10         }     } }
  • 35. Facets - lessons • • • Bug in 0.90.x: https://github.com/elasticsearch/elasticsearch/ issues/1305* Solutions: use 1 shard ask for top 100 instead of 10 *will be solved in 1.0 with aggregation module
  • 36. Analyzers A Lucene analyzer consists of a tokenizer and an arbitrary amount of filters (+ char filters)
  • 37. { "index":{ "analysis":{ "filter":{ "bigram_shingle_filter":{ "type":"shingle", "max_shingle_size":2, "min_shingle_size":2, ... "analyzer":{ "bigram_analyzer":{ "tokenizer":"whitespace", "filter":[ "standard", "bigram_shingle_filter" ] }, "trigram_analyzer":{ "tokenizer":"whitespace", "filter":[ "standard", "trigram_shingle_filter" ] } "output_unigrams":"false", "output_unigrams_if_no_shingles":"fal se" }, "trigram_shingle_filter": { "type":"shingle", "max_shingle_size":3, "min_shingle_size":3, } } "output_unigrams":"false", "output_unigrams_if_no_shingles":"fal se" } } ... } }
  • 38. Relations between Documents Author 1 N Book • nested: faster reads, update needs reindex, cross object match • parent/child: same shard, no reindex on update, difficult sorting
  • 39. Nested Documents Specify Book type is “nested” in Author’s Mapping We can query Authors with a query on properties of nested Books “Authors who published at least a book with Penguin, in scifi genre”
  • 40. curl -XGET localhost:9200/authors/nested_author/ _search -d ' { "query": { "filtered": { "query": {"match_all": {}}, "filter": { "nested": { "path": "books", "query":{ "filtered": { "query": { "match_all": {}}, "filter": { "and": [ {"term": {"books.publisher": "penguin"}}, {"term": {"books.genre": "scifi"}} ]
  • 41. Parent and Child Indexing happens separately Specify _parent type in Child mapping (Book) When indexing Books, specify id of Author
  • 42. curl -XPOST localhost:9200/authors/book/_mapping -d '{ "book":{ "_parent": {"type": "bare_author"} } }' curl -XPOST localhost:9200/authors/book/1?parent=2 -d '{ "name": "Revelation Space", "genre": "scifi", "publisher": "penguin" }'
  • 43. Parent and Child query curl -XPOST localhost:9200/authors/bare_author/ _search -d '{ "query": { "has_child": { "type": "book", "query" : { "filtered": { "query": { "match_all": {}}, "filter" : { "and": [ {"term": {"publisher": "penguin"}}, {"term": {"genre": "scifi"}} ]
  • 44. Data Design Index Configurations • One index “per user” • Single index • SI + Routing: 1 index + custom doc routing • to shards Time: 1 index per time window * * we can search across indices
  • 45. One Index per user Hulk Thor User1 s0 User1 s1 User2 s0 + different sharding per user - small users own (and cost) at least 1 shard
  • 46. Single Index Hulk Thor Users s0 Users s3 Users s2 + filter by user id, support growth - search hits all shards
  • 47. Single Index + routing Hulk Thor Users s0 Users s3 Users s2 + a user’s data is all in one shard, allows large overallocation
  • 48. Index per time range Hulk Thor 2013_01 s1 2013_01 s2 2013_02 s1 + allows change in future indices
  • 49. Data Design - lessons Test, test, test your use case! Take a single node with one shard and throw load at it, checking the shard capacity The shard is the scaling unit: overallocate to enable future scaling #shards > #nodes
  • 50. ...ES has lots of other features! • Bulk operations • Percolator (alerts, classification, …) • Suggesters (“Did you mean …?”) • Index templates (Automatic index • • • configuration) Monitoring API (Amount of memory used, number of operations, …) Plugins ...
  • 51. Thanks! @matteomoci http://mox.fm