Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Druid meetup 2018-03-13

552 views

Published on

Three talks in 20 minutes:

- Druid 0.12.0… and beyond!!
- Apache Druid (incubating)
- NoSQL no more: SQL on Druid with Apache Calcite

Published in: Data & Analytics
  • Be the first to comment

Druid meetup 2018-03-13

  1. 1. three talks in 20 minutes Gian Merlino gian@imply.io
  2. 2. Who am I? Gian Merlino Committer & PMC member on Cofounder at 2
  3. 3. Three talks in 15 minutes ● Druid 0.12.0… and beyond!! ● Apache Druid (incubating) ● NoSQL no more: SQL on Druid with Apache Calcite 3
  4. 4. Druid 0.12.0 …and beyond!!
  5. 5. Druid 0.12.0 ● Kafka indexing: incremental publishing ● Kafka indexing: partition multiplexing ● Prioritized locking ● New quantiles sketch ● Parser de-batching ● SQL improvements: performance, expressivity 5
  6. 6. And beyond! ● Parallel loading of data files without Hadoop ● Indexing errors and statistics APIs ● Automatic compaction ● Better integer compression ● Subtotals, SQL “grouping sets” ● SQL compatible null handling ● Vectorized query engine ● Garbage-free expression engine 6
  7. 7. Apache Druid (incubating)
  8. 8. Apache Druid (incubating) ● Started 2018-02-28 ● Migration logistics on dev list ● Join our new mailing lists! ● Still figuring out source repos, website, etc. ● Druid 0.12.0 is not an Apache release ● Maybe 0.13.0 will be? 8
  9. 9. Apache FAQ What does incubation mean? 9
  10. 10. Apache FAQ How long does incubation take? 10
  11. 11. Apache FAQ Will we keep using GitHub? 11
  12. 12. Apache FAQ How will releases work? 12
  13. 13. NoSQL no more SQL on Druid with Apache Calcite
  14. 14. What is NoSQL? “There's no strong definition of the concept out there, no trademarks, no standard group, not even a manifesto.” 14 Source: https://martinfowler.com/bliki/NosqlDefinition.html
  15. 15. What is NoSQL? ● Not using the relational model (nor the SQL language) ● Open source ● Designed to run on large clusters ● Based on the needs of 21st century web properties ● No schema, allowing fields to be added to any record without controls 15 Source: https://martinfowler.com/bliki/NosqlDefinition.html
  16. 16. Druid and the Relational Model Is avoiding the SQL language and relational model really a good thing? 16
  17. 17. Druid and the Relational Model ● Datasources are like tables ○ Druid “lookups” apply to a common join use case ○ Big, flat tables are common in SQL databases anyway, when analytical performance is critical ● Benefits of offering SQL ○ Developers and analysts know it ○ Integration with 3rd party apps 17
  18. 18. 18 Enter…
  19. 19. Apache Calcite ● SQL parser ● Query optimizer ● Query interpreter ● JDBC server (Avatica) 19
  20. 20. Apache Calcite ● Widely used ○ Druid ○ Hive ○ Storm ○ Samza ○ Drill ○ Phoenix ○ Flink 20
  21. 21. Apache Calcite 21 SQL SqlNode Parse tree RelNode Relational operator tree RelNode Optimized in target calling convention
  22. 22. Relational operators SELECT dim1, COUNT(*) FROM druid.foo WHERE dim1 IN ('abc', 'def', 'ghi') GROUP BY dim1 22 LogicalAggregate(group=[{0}], EXPR$1=[COUNT()]) LogicalProject(dim1=[$2]) LogicalFilter(condition=[OR(=($2, 'abc'), =($2, 'def'), =($2, 'ghi'))]) LogicalTableScan(table=[[druid, foo]])
  23. 23. SQL to Native translation 23 PartialDruidQuery Scan Filter Project Aggregate Filter Project Sort Druid’s query execution pipeline
  24. 24. SQL to Native translation 24 PartialDruidQuery Filter Project Sort { "queryType" : "groupBy", "dataSource" : “foo”, "filter" : { "type" : "in", "dimension" : "dim1", "values" : [ "abc", "def", "ghi" ] }, "dimensions" : [ “dim1” ], "aggregations" : [ { "type" : "count", "name" : "a0" } ], } Scan(table=[[druid, foo]]) Filter(condition=[OR(=($2, 'abc'), =($2, 'def'), =($2, 'ghi'))]) Project(dim1=[$2]) Aggregate(group=[{0}],EXPR$1=[COUNT()]) toDruidQuery()
  25. 25. Future work ● Druid features not supported in Druid SQL (as of 0.12) ○ Multi-value dimensions ○ Spatial filters ○ Theta sketches (approx. set intersection, differences) ● JOIN related ○ Allow users to write lookups as a SQL JOIN ○ Allow JOINs between two Druid datasources ● Others: SQL window functions, SQL UNION, GROUPING SETS 25
  26. 26. Full talk slides https://www.slideshare.net/gianmerlino/nosql-no-more-sql-on-druid-with-apache-calcite Video should be available in 1–2 months. 26
  27. 27. Fin Thank you! And, we’re hiring: https://imply.io/careers 27

×