Real-time Data Analytics mit Elasticsearch

955 views

Published on

Elasticsearch hat sich in kurzer Zeit zu einer Top Open-Source-Suchtechnologie etabliert. Fernab von Such-Szenarien zeigt Elasticsearch auch im Bereich Real-time Data Analytics auf "Big Data" seine Stärken. Warum? Es ist skalierbarer Document Store und bietet performante Analysemethoden. Es bietet eine Hadoop Integration zu Pig, Hive, MapReduce und das Analyse-Webfront Kibana für interaktive Dashboards und Enduser-Analysen ist auch mit dabei. In diesem Vortrag möchte ich dem Publikum näherbringen, wie Elasticsearch für Real-time Data Analytics auf großen Datenvolumina genutzt werden kann.

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
955
On SlideShare
0
From Embeds
0
Number of Embeds
15
Actions
Shares
0
Downloads
20
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Real-time Data Analytics mit Elasticsearch

  1. 1. Real-time Data Analytics mit Elasticsearch Bernhard Pflugfelder inovex GmbH
  2. 2. 2 ‣  Big Data Engineer @ inovex ‣  Fields of interest: ‣  search ‣  analytics ‣  big data ‣  bi ‣  Working with: ‣  Lucene ‣  Solr ‣  Elasticsearch ‣  Hadoop Ecosystem ‣  bpflugfelder@inovex.de Bernhard Pflugfelder
  3. 3. ‣  elasticsearch intro ‣  import your data ‣  analyze your data ‣  visualize your data Agenda
  4. 4. 4 You know, … don’t you?
  5. 5. 5 data analysis landscape the big picuture
  6. 6. 6 elasticsearch intro Lucene under the hood scalable document- oriented plugin architecture REST & JSON Apache 2 license
  7. 7. 7 elasticsearch intro architecture Primary Shard Replica Shard 1 2 3 Master node 321 Node 21 3 Node JSON Input JSON Output
  8. 8. 8 fault tolerant node discovery node types high availability elasticsearch intro architecture
  9. 9. 9 elasticsearch intro document-oriented & flat data model
  10. 10. 10 real-time get core types mapping search query types insert, update, delete snapshot & backup elasticsearch intro core types, mapping, manipulation
  11. 11. 11 getting data into elasticsearch …
  12. 12. 12 getting data into elasticsearch logstash index api http bindings rivers spring-data- elasticsearch flume fluentd
  13. 13. ‣  log collection and management tool ‣  collects, parses and stores log events ‣  became part of the ELK stack ‣  seamless integration with elasticsearch ‣  plugin architecture: ‣  inputs (syslog, ganglia, log4j and more) ‣  codec (json, line, multiline,… ) ‣  filters (csv, json, date, grep, … ) ‣  outputs (elasticsearch, … ) ‣  expect that logstash will be promoted to a more general ingestion pipeline 13 getting data into elasticsearch logstash
  14. 14. ‣  works as an elasticsearch plugin ‣  service for pulling data into cluster ‣  examples: ‣  couchdb river ‣  rabbitmq river ‣  csv river ‣  jdbc river ‣  twitter river ‣  wikipedia river ‣  runs on a single node ‣  automatic allocation ‣  shall be deprecated sooner or later 14 getting data into elasticsearch rivers
  15. 15. ‣  former hadoop-elasticsearch became official integration of elasticsearch and hadoop ‣  makes elasticsearch accessible from hive, pig, cascading and map/reduce ‣  automatic mapping between elasticsearch’s json and hadoop file formats ‣  every query to elasticsearch is performed by m/r jobs as follows: ‣  one mapper task per shard ‣  final aggregation by reducer ‣  elasticsearch works as a separate data store, index files are not stored in hdfs 15 getting data into elasticsearch elasticsearch and hadoop from http://www.elasticsearch.org/blog/elasticsearch-and-hadoop/
  16. 16. 16 analyze your data …
  17. 17. 17 analyze your data you know about facets, I am sure
  18. 18. 18 analyze your data same analysis methodology, other visualization == kibana panels
  19. 19. 19 analyze your data next generation of facets facets aggregations limited analysis functionality facets enabling custom analysis
  20. 20. 20 analyze your data aggregations (aggs) ‣  gives insight into data space by ‣  slicing along dimensions ‣  drill down ‣  interactive ‣  quick by using field data ‣  two types of aggregations ‣  many types of aggregators ‣  customize with scripting ‣  use over search api ‣  json in / json out
  21. 21. Bucket aggs Aggregations that split the original set of documents into separate buckets. Metric aggs Aggregations that compute a specific metrics over a set of documents by aggregating of all documents per bucket. 21 analyze your data two aggregation types
  22. 22. 22 analyze your data aggregation example _id: 1 ref: seo _id: 2 ref: direct _id: 3 ref: seo _id: 4 ref: other _id: 5 ref: direct _id: 6 ref: seo ref: seo id: 1, 3, 6 ref: direct id: 2, 5 ref: other id: 4 bucket metrics 3 2 1
  23. 23. 23 analyze your data aggregation example _id: 1 ref: seo _id: 2 ref: direct _id: 3 ref: seo _id: 4 ref: other _id: 5 ref: direct _id: 6 ref: seo seo id: 1, 3, 6 direct id: 2, 5 other id: 4 buckets metrics 2 1 1 desktop id: 1 mobile id: 3, 6 desktop id: 1, 3, 6 1 desktop id: 2 mobile id: 5 1
  24. 24. my_aggregation: 24 analyze your data customize your analysis with nested aggregators "aggregations": {! !"<aggregation_name>": {! ! !"<aggregation_type>": {! ! ! !<aggregation_body>! ! !},! ! !["aggregations": { [<sub_aggregation>]* }]! !}! ![,"<aggregation_name_2>": { … }]*! }! bucket 1 bucket 2 bucket n metrics…
  25. 25. ‣  terms ‣  range ‣  date range ‣  histogram ‣  date histogram ‣  geo distance ‣  geohash grid ‣  ... ‣  min ‣  max ‣  sum ‣  avg ‣  value count ‣  percentiles ‣  cardinality ‣  ... 25 analyze your data many types of aggregators
  26. 26. 26 visualize your data …
  27. 27. 27
  28. 28. 28
  29. 29. 29 sharing dashboards light-weighted web frontend visualize time-stamped data panels to visualize creating dashboards visualize your data kibana fancy visualization
  30. 30. 30 ‣  elasticsearch is not only a search technology ‣  elasticsearch also provides powerful capabilities for data analytics ‣  aggregations framework ‣  real-time analytics ‣  plus: elasticsearch enables you to analyze unstructured along with structured data in one place ‣  data analytics ecosystem of elasticsearch: ‣  ELK stack (ingestion + analysis + visualization) ‣  deep hadoop integration to avoid separate data silos and make use of the advantages of both words wrapping up … and thanks for your attention!
  31. 31. 31 Thank you very much for your attention Contact Bernhard Pflugfelder Big Data Engineer Cell: 0173 3181088 Mail: bernhard.pflugfelder@inovex.de

×