Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Druid - Real-time interactive analytics at scale

511 views

Published on

In this talk I would like to introduce you to Druid (http://druid.io/), a powerful open-source technology used by companies like Metamarkets, Yahoo, Netflix or eBay, that can be used to create Real-Time Interactive Analytics stack. Druid is a distributed columnar datastore, built specifically for exploratory analytics for OLAP workflows. With it you can build analytics on events streams that power your dashboards, your monitoring, business intelligence and exploratory tools. It can scale with your data, going to hundreds of nodes storing and analyzing petabytes or years of data, while keeping most of queries returning in subsecond time. All that on data that is explorable in few seconds after being ingested. I’m going to talk about Druid specifically, but also about other technologies involved in building Real-Time Analytics stack.

Published in: Data & Analytics
  • Be the first to comment

Druid - Real-time interactive analytics at scale

  1. 1. Master dataset Hadoop files Hive tables Batch view Query layer Postgres? >15 minutes latency Pre-computed Pre-aggregated views ETL SLOW queries Kafka
  2. 2. ● ● ● ● ● ● ●
  3. 3. ● ○ ○ ● ○ ○
  4. 4. ● ● ● ●
  5. 5. ● ● ●
  6. 6. ● ●
  7. 7. ● ● ● { "queryType" : "timeseries", "dataSource" : "wikipedia", "intervals" : "2013-01-01/2013-01-08" , "filter" : { "type" : "selector", "dimension" : "page", "value" : "Ke$ha" }, "granularity" : "day", "aggregations" : [{"type":"count", "name":"rows"}] }
  8. 8. events
  9. 9. events
  10. 10. 12:00 13:00 14:00
  11. 11. * Not mine
  12. 12. ● ● ●
  13. 13. Kafka
  14. 14. ● ● ● ● ●
  15. 15. ● ● ● ●
  16. 16. ● ● ●
  17. 17. Do you have any questions?

×