Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ELK - What's new and showcases

707 views

Published on

Overview of ELK current state
Elasticsearch new aggregations and examples how to easy use them to solve some interesting problems

Published in: Technology
  • Be the first to comment

ELK - What's new and showcases

  1. 1. ELASTICSEARCH & CO. What’s new? tech talk @ ferret Andrii Gakhov
  2. 2. NEW BRAND www.elastic.co
  3. 3. ELK open source data visualization platform that allows you to interact with your data through stunning, powerful graphics. distributed, open source search and analytics engine, designed for horizontal scalability, reliability, and easy management. flexible, open source data collection, parsing, and enrichment pipeline. Shield brings enterprise-grade security to Elasticsearch, protecting the entire ELK stack with encrypted communications, authentication, role-based access control and auditing. comprehensive tool that provides you with complete transparency into the status of your Elasticsearch deployment. Elasticsearch 1.4.4 Kibana 4.0.1 Logstash 1.4.2Marvel Shield 1.0.1
  4. 4. SHIELD Security as a Plugin Security features for Elasticsearch are implemented in a plugin that you install on each node in your cluster.
  5. 5. ARCHITECTURE NOTES • The plugin intercepts inbound API calls in order to enforce authentication and authorization. • The plugin provides encryption using Secure Sockets Layer/Transport Layer Security (SSL/TLS) for the network traffic to and from the Elasticsearch node. • The plugin uses the API interception layer that enables authentication and authorization to provide audit logging capability.
  6. 6. MAIN FEATURES • User Authentication
 Shield defines (realm) a known set of users in order to authenticate users that make requests.The supported realms are esusers and LDAP. • Authorization
 Shield’s data model for action authorization includes: Secured Resource, Privilege, Permissions, Role, Users • Node Authentication and Channel Encryption
 Shield use SSL/TLS to wrap usual node communication over port 9300.When SSL/TLS is enabled, the nodes validate each other’s certificates, establishing trust between the nodes. • IP Filtering
 Shield provides IP-based access control for Elasticsearch nodes that allows to restrict which other servers, via their IP address, can connect to Elasticsearch nodes and make requests. • Auditing
 The audit functionality in a secure Elasticsearch cluster logs particular events and activity on that cluster. The events logged include authentication attempts, including granted and denied access.
  7. 7. KIBANA Kibana 4 provides dozens of new features that enable you to compose questions, get answers, and solve problems like never before.
  8. 8. WHAT’S NEW? • New interface with D3, drag&drop dashboard builder • New diagrams:Area Chart, DataTable, MarkdownText Widget, Pie Chart, Raw Document Widget, Single Metric Widget,Tile Map,Vertical Bar Chart • Advanced aggregation-based analytics capabilities: Unique counts (cardinality), Non-date histograms, Ranges, Significant terms, Percentiles etc. • Expressions-based scripted fields enable you to perform ad-hoc analysis by performing computations on the fly • Search result highlighting • Ability to save searches and visualizations • Faster dashboard loading due to a reduction in the number HTTP calls needed to load the page • SSL encryption for client requests as well as requests to and from Elasticsearch
  9. 9. ELASTICSEARCH
  10. 10. WHAT’S NEW? SINCE 1.2.0 • Upgraded to Lucene 4.10.1 release • New aggregations: percentiles_rank, top_hits, cardinality, scripted_metric, … • Added sum of the doc counts of other buckets in terms aggs • Added support bounding box aggregation on geo_shape/ geo_point data types • Parent/child optimization • Added support for scripted upserts • Fielddata and cache optimisation • Removed deprecated gateway functionality • …
  11. 11. PERCENTILES RANK AGGREGATION A multi-value metrics aggregation that calculates one or more percentile ranks over numeric values extracted from the aggregated documents. { “aggs” : { “load_time_outlier” : { “percentile_ranks” : { “field” :“load_time”, “values” : [15, 30] } } } } { “aggregations” : { “load_time_outlier” : { “values” : { “15”: 92, “30”: 100 } } } } Example above shows that 92% of page were loaded within 15 sec, and 100% within 30 sec.
  12. 12. TOP HITS AGGREGATION A top_hits metric aggregator keeps track of the most relevant document being aggregated.This aggregator is intended to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket. { “aggs”: { “top_logs”: { “top_hits”: { “sort": [ { “created_at”: { “order”:“desc” } } ], “_source”: { “include”: [ “path” ] } } } { “aggregations”: { “top_logs”: { “hits”: { “total”: 180 “hits”: [ { “_index”:“logs”, “_type”:“log”, “_id”:“an893d30mlss”, “_source”: { “path”:“/home/user/” } sort: [ 1422388801000 ] … }
  13. 13. CARDINALITY AGGREGATION A single-value metrics aggregation that calculates an approximate count of distinct values. It is based on the HyperLogLog++ algorithm, which counts based on the hashes of the values with some interesting properties: • configurable precision, which decides on how to trade memory for accuracy, • excellent accuracy on low-cardinality sets, • fixed memory usage: no matter if there are tens or billions of unique values, memory usage only depends on the configured precision. { “aggs” : { “tags_count” : { “cardinality” : { “field” :“tags”, “precision_threshold”: 100 } } } } { “aggregations” : { “tags_count” : { “value”: 120002 } } }
  14. 14. SCRIPTED METRIC AGGREGATION A metric aggregation that executes using scripts to provide a metric output. { “aggs” : { "profit": { "scripted_metric": { "init_script" : "_agg['transactions'] = []", "map_script" : "if (doc['type'].value == "sale") { _agg.transactions.add(doc['amount'].value) } else { _agg.transactions.add(-1 * doc['amount'].value) }", "combine_script" : "profit = 0; for (t in _agg.transactions) { profit += t }; return profit", "reduce_script" : "profit = 0; for (a in _aggs) { profit += a }; return profit" } } }
  15. 15. SHOWCASES
  16. 16. PROBLEM I { “location”: { “type”:“geo_point” }, “tags”: { “type”:“string”, “index”:“not_analyzed” }, “text”: { “type”:“string”, “index”:“not_analyzed” } } Find most popular tags per location (e.g. grouping by geohash with precision 10km x 10km)
  17. 17. SOLUTION use geohash_grid and terms aggregations { “aggs”: { “hotspots”: { “geohash_grid” : { “field”:“location”, “precision”: 10 }, "aggs": { “top_tags": { "terms": { “field”:“tags” } … }
  18. 18. RESPONSE EXAMPLE “aggregations”: { “hotspots”: { “buckets”: [ { "key": "dr5rs", "doc_count": 2 “top_tags”: { “buckets”: [ { “key”:“#NY” “doc_count”: 20001 }, { “key”:“#Obama” “doc_count”: 1201 }, … ] } }, … ] …
  19. 19. PROBLEM II { “event”: { “type”:“string”, “index”:“not_analyzed" }, “rating”: { “type”:“float” } } } Find total number of records and average rating for events with most number of rating records
  20. 20. SOLUTION { “aggs”: { “top_events”: { “terms”: { “field”:“event” }, “aggs”: { “avg_rating”: { “avg”: { “field”:“rating” } … } use terms and avg aggregations
  21. 21. RESPONSE EXAMPLE “aggregations”: { “top_events”: { “buckets”: [ { “key”:“Venus Berlin” “doc_count”: 36665, “avg_rating”: { “value”: 9.991 } }, { “key”:“ITB Berlin” “doc_count”: 365, “avg_rating”: { “value”: 8.46 } } …
  22. 22. PROBLEM III { “tags”: { “type”:“string”, “index”:“not_analyzed" }, “keywords”: { “type”:“nested”, “properties”: { “lemma”: { “type”:“string”, “index”:“not_analyzed" } } } Find top tags for most popular keywords’ lemmas
  23. 23. SOLUTION { "aggs": { "kw": { "nested": { "path":“keywords" }, "aggs": { "top_lemmas": { "terms": { "field":“keywords.lemma" }, "aggs": { "kw_to_tags": { "reverse_nested": {}, "aggs": { "top_tags_per_lemma": { "terms": { "field":“tags" } } … } use nested aggregation together with terms and reverse_nested aggregations
  24. 24. RESPONSE EXAMPLE “aggregations”: { “kw”: { “doc_count”: 6829872, “top_lemmas”: { “buckets”: [ { “key”:“BMW” “doc_count”: 36665, “kw_to_lemma”: { “doc_count”: 36626 “top_tags_per_lemma: { “buckets”: [ { “key”:“auto” “doc_count”: 36626 }, { “key”:“car” “doc_count”: 12216 }, ] …
  25. 25. PROBLEM IV { “tags”: { “type”:“string”, “index”:“not_analyzed" }, “text”: { “type”:“string”, “index”:“not_analyzed" }, “created_at”: { “type”:“date” } } Find latest tweets for most popular tags
  26. 26. SOLUTION use terms and top_hits aggregations { “aggs”: { “top_tags”: { “terms”: { “field”:“tags” }, “aggs”: { “top_tweets”: { “top_hits”: { “sort": [ { “created_at”: { “order”:“desc” } } ], } … }
  27. 27. RESPONSE EXAMPLE “aggregations”: { “top_tags”: { “buckets”: [ { “key”:“#TheDress” “doc_count”: 30000 “top_tweets”: { “hits”: { “total”: 30000 “hits”: [ { “_index”:“tweets”, “_type”:“tweet”, “_id”:“579024639982202880”, “_source”: { “tags”: [ “#TheDress”,“#TheSims4”] “text”:“just put #TheDress in #TheSims4!” “created_at”: 2015-03-20T20:00:01 } sort: [ 1422388801000 ] …
  28. 28. PROBLEMV { “topics”: { “type”:“string”, “index”:“not_analyzed" }, “title”: { “type”:“string” }, “created_at”: { “type”:“date” } } Find news that contain “Obama” in title and top topics from all news regardless the title
  29. 29. SOLUTION use query_string, global and terms aggregations { “query”: { “query_string”: { “default_field” :“title”, “query” :“Obama” } }, “aggs”: { “all_news”: { “global” : {}, “aggs”: { “top_topics”: { “terms”: { “field”:“topics” } … }
  30. 30. RESPONSE EXAMPLE “hits”: { “total”: 23, “max_score”: 2.9730792, “hits”: [ { “_index”:“news”, “_type”: ”record”, “_id”: 6785, “_score”: 2.9730792, “_source”: … }, … ] }, “aggregations”: { “all_news”: { “doc_count”: 24495, “top_tags”: { “buckets”: [ { “key”:“Politics” “doc_count”: 20001 } …
  31. 31. THANKYOU

×