Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Real-time Analytics with Apache Flink and Druid

1,169 views

Published on

Slides of the talk I gave @ Berlin Buzzwords 2016.

Published in: Engineering
  • Be the first to comment

Real-time Analytics with Apache Flink and Druid

  1. 1. REAL-TIME ANALYTICS WITH APACHE FLINK AND DRUID Berlin Buzzwords 2016 Jan Graßegger - @gesundkrank
  2. 2. DATA ENGINEER @
  3. 3. OUR DATA 70,000EVENTS PER SECOND 50DIMENSIONS 20METRICS
  4. 4. DRUID
  5. 5. DRUID ‣ Online Analytical Processing (OLAP) System ‣ Column-oriented ‣ Distributed ‣ Built-in data sharding based on time windows ‣ JSON query language
  6. 6. DATA STRUCTURES Column TOP PRIVATE DOMAIN battle.net battle.net noxxic.com noxxic.com Strings to Integers battle.net 5 noxxic.com 6 Encoded column data [5, 5, 6, 6]
  7. 7. DATA STRUCTURES Column Bitmap Indices battle.net [1, 1, 0, 0] noxxic.com [0, 0, 1, 1] TOP PRIVATE DOMAIN battle.net battle.net noxxic.com noxxic.com
  8. 8. FIREHOSES FIREHOSES
  9. 9. APACHE FLINK
  10. 10. PROCESSING ?Kafka Flink Druid
  11. 11. TRANQUILITY
  12. 12. TRANQUILITY ‣ Helps ingesting real-time data into Druid ‣ Provides adapters for Samza, Spark, Storm and Flink ‣ Standalone HTTP and Kafka applications
  13. 13. Kafka Flink Druid Tranquility PROCESSING Replays?
  14. 14. LAMBDA
  15. 15. KAPPA
  16. 16. Kafka Flink Druid Tranquility HDFS for replays PROCESSING
  17. 17. RESULTS ▸Kappa-like architecture that’s able to do replays from HDFS & Kafka ▸Added Flink sink to Tranquility ▸“Hacked“ replays into Tranquility ▸Real-Time Reporting
  18. 18. QUESTIONS?

×