Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Interactive analytics at scale with druid

4,225 views

Published on

How do you run analytics queries on a dataset that grows by billions of entries a day? What if you need to be able to drill into it, filter it or aggregate it by any available dimension? On demand, without precomputing, and with sub second latency? Oh and this isn't for an internal dashboard. The interface is customer-facing and is going to be accessed by thousands of clients.

In this talk, I'll tell you how Criteo, one of the biggest ad tech firms in the world, uses Druid to build its new analytics platform.

Druid is an open-source, real-time data store designed to power interactive applications at scale. I’ll walk you through its architecture, explain how it scales and how data is stored on disk and in memory to serve queries faster than you can blink.

Published in: Technology
  • Be the first to comment

Interactive analytics at scale with druid

  1. 1. By advertiser… By country… … …
  2. 2. Device Publisher Cost Iphone Google 0.1€ Android Yahoo 0.2€ Dimension Cost Iphone 0.1€ Android 0.2€ Google 0.1€ Yahoo 0.1€ Iphone, Google 0.1€ Android, Yahoo 0,2€
  3. 3. Device Publisher Product Cost Iphone Google Computer 0.1€ Android Yahoo Cloth 0.2€ Dimension Cost Iphone 0.1€ Android 0.2€ Google 0.1€ Yahoo 0.1€ Computer 0.1€ Cloth 0.2€ Iphone, Google 0.1€ Iphone, Computer 0.1€ Google, Computer 0.1€ Android, Yahoo 0.2€ Android, Cloth 0.2€ Yahoo, Cloth 0.2€ Iphone, Google, Computer 0.1€ Android, Yahoo, Cloth 0.2€
  4. 4. SELECT sum(revenue) AS “Revenue", sum(sales) AS “Sales" FROM customer-insights WHERE client_id = 2255 AND date BETWEEN "2014-08-01" AND "2014-08-08" GROUP BY day(date)
  5. 5. { queryType: "Timeseries", dataSource: "customer-insights", granularity: "day", aggregations: [ { type: "longSum", fieldName: "revenue", name: "Revenue" }, { type: "longSum ", fieldName: "sales", name: "Sales" } ], filter: { type: "selector", dimension: "client_id", value: 2255 }, intervals: ["2014-08-01T00:00/2014-08-08T00"] }
  6. 6. { queryType: "Timeseries", dataSource: "customer-insights", granularity: "day", aggregations: [ { type: "longSum", fieldName: "revenue", name: "Revenue" }, { type: "longSum ", fieldName: "sales", name: "Sales" } ], filter: { type: "selector", dimension: "client_id", value: 2255 }, intervals: ["2014-08-01T00:00/2014-08-08T00"] }
  7. 7. { queryType: "Timeseries", dataSource: "customer-insights", granularity: "day", aggregations: [ { type: "longSum", fieldName: "revenue", name: "Revenue" }, { type: "longSum ", fieldName: "sales", name: "Sales" } ], filter: { type: "selector", dimension: "client_id", value: 2255 }, intervals: ["2014-08-01T00:00/2014-08-08T00"] }
  8. 8. { queryType: "Timeseries", dataSource: "customer-insights", granularity: "day", aggregations: [ { type: "longSum", fieldName: "revenue", name: "Revenue" }, { type: "longSum ", fieldName: "sales", name: "Sales" } ], filter: { type: "selector", dimension: "client_id", value: 2255 }, intervals: ["2014-08-01T00:00/2014-08-08T00"] }
  9. 9. { queryType: "Timeseries", dataSource: "customer-insights", granularity: "day", aggregations: [ { type: "longSum", fieldName: "revenue", name: "Revenue" }, { type: "longSum ", fieldName: "sales", name: "Sales" } ], filter: { type: "selector", dimension: "client_id", value: 2255 }, intervals: ["2014-08-01T00:00/2014-08-08T00"] }
  10. 10. { queryType: "Timeseries", dataSource: "customer-insights", granularity: "day", aggregations: [ { type: "longSum", fieldName: "revenue", name: "Revenue" }, { type: "longSum ", fieldName: "sales", name: "Sales" } ], filter: { type: "selector", dimension: "client_id", value: 2255 }, intervals: ["2014-08-01T00:00/2014-08-08T00"] }
  11. 11. { queryType: "Timeseries", dataSource: "customer-insights", granularity: "day", aggregations: [ { type: "longSum", fieldName: "revenue", name: "Revenue" }, { type: "longSum ", fieldName: "sales", name: "Sales" } ], filter: { type: "selector", dimension: "client_id", value: 2255 }, intervals: ["2014-08-01T00:00/2014-08-08T00"] }
  12. 12. { queryType: "groupBy", dataSource: "customer-insights", granularity: "all", dimensions: ["device"], aggregations: [ { type: "longSum", fieldName: "clicks", name: "Clicks" }, { type: "longSum", fieldName: "cost", name: "Cost" } ], filter: { type: "selector", dimension: "client_id", value: 2255 }, intervals: ["2014-08-01T00:00/2014-08-08T00"] }
  13. 13. Iphone Google 0.35€08:12:00 Android Yahoo 0.2€08:12:00 Iphone Google 0.1€08:12:37 Android Yahoo 0.2€08:12:38 Iphone Google 0.15€08:12:39 Iphone Google 0.1€08:12:40
  14. 14. Iphone Google 0.1€08:12:37 Android Yahoo 0.2€08:12:38 Iphone Google 0.15€08:13:02 Iphone Google 0.1€08:13:08 bob@mail.com joe@mail.com bob@mail.com tony@mail.com
  15. 15. Iphone Google Computer 0.1€08:12:37 Android Yahoo Cloth 0.2€08:12:38 Iphone Google Computer 0.1€08:12:37 Android Yahoo Cloth 0.2€08:12:38
  16. 16. Iphone Google Computer 0.1€08:12:37 Android Yahoo Cloth 0.2€08:12:38
  17. 17. Google Yahoo Microsoft Google Yahoo 1 2 3 1 2
  18. 18. Google Yahoo Microsoft Google Yahoo Google 1,0,0,0,1 Yahoo 0,1,0,1,0 Microsoft 0,0,1,0,0
  19. 19. Filter Time period Aggregated rows Time Biggest advertiser 6 months 1.5M 225ms Android devices 1 month 42M 750ms Desktop 1 month 110M 910ms RTB networks 7 months 753M 2.4s

×