SlideShare a Scribd company logo
1 of 27
Download to read offline
three talks in 20 minutes
Gian Merlino
gian@imply.io
Who am I?
Gian Merlino
Committer & PMC member on
Cofounder at
2
Three talks in 15 minutes
● Druid 0.12.0… and beyond!!
● Apache Druid (incubating)
● NoSQL no more: SQL on Druid with Apache Calcite
3
Druid 0.12.0
…and beyond!!
Druid 0.12.0
● Kafka indexing: incremental publishing
● Kafka indexing: partition multiplexing
● Prioritized locking
● New quantiles sketch
● Parser de-batching
● SQL improvements: performance, expressivity
5
And beyond!
● Parallel loading of data files without Hadoop
● Indexing errors and statistics APIs
● Automatic compaction
● Better integer compression
● Subtotals, SQL “grouping sets”
● SQL compatible null handling
● Vectorized query engine
● Garbage-free expression engine
6
Apache Druid
(incubating)
Apache Druid (incubating)
● Started 2018-02-28
● Migration logistics on dev list
● Join our new mailing lists!
● Still figuring out source repos, website, etc.
● Druid 0.12.0 is not an Apache release
● Maybe 0.13.0 will be?
8
Apache FAQ
What does incubation mean?
9
Apache FAQ
How long does incubation take?
10
Apache FAQ
Will we keep using GitHub?
11
Apache FAQ
How will releases work?
12
NoSQL no more
SQL on Druid with Apache Calcite
What is NoSQL?
“There's no strong definition of the concept out there, no
trademarks, no standard group, not even a manifesto.”
14
Source: https://martinfowler.com/bliki/NosqlDefinition.html
What is NoSQL?
● Not using the relational model (nor the SQL language)
● Open source
● Designed to run on large clusters
● Based on the needs of 21st century web properties
● No schema, allowing fields to be added to any record without
controls
15
Source: https://martinfowler.com/bliki/NosqlDefinition.html
Druid and the Relational Model
Is avoiding the SQL language and
relational model really a good thing?
16
Druid and the Relational Model
● Datasources are like tables
○ Druid “lookups” apply to a common join use case
○ Big, flat tables are common in SQL databases anyway,
when analytical performance is critical
● Benefits of offering SQL
○ Developers and analysts know it
○ Integration with 3rd party apps
17
18
Enter…
Apache Calcite
● SQL parser
● Query optimizer
● Query interpreter
● JDBC server (Avatica)
19
Apache Calcite
● Widely used
○ Druid
○ Hive
○ Storm
○ Samza
○ Drill
○ Phoenix
○ Flink
20
Apache Calcite
21
SQL
SqlNode
Parse tree
RelNode
Relational
operator tree
RelNode
Optimized in
target calling
convention
Relational operators
SELECT dim1, COUNT(*)
FROM druid.foo
WHERE dim1 IN ('abc', 'def', 'ghi')
GROUP BY dim1
22
LogicalAggregate(group=[{0}], EXPR$1=[COUNT()])
LogicalProject(dim1=[$2])
LogicalFilter(condition=[OR(=($2, 'abc'), =($2, 'def'), =($2, 'ghi'))])
LogicalTableScan(table=[[druid, foo]])
SQL to Native translation
23
PartialDruidQuery
Scan
Filter
Project
Aggregate
Filter
Project
Sort
Druid’s query
execution
pipeline
SQL to Native translation
24
PartialDruidQuery
Filter
Project
Sort
{
"queryType" : "groupBy",
"dataSource" : “foo”,
"filter" : {
"type" : "in",
"dimension" : "dim1",
"values" : [ "abc", "def", "ghi" ]
},
"dimensions" : [ “dim1” ],
"aggregations" : [ {
"type" : "count",
"name" : "a0"
} ],
}
Scan(table=[[druid, foo]])
Filter(condition=[OR(=($2,
'abc'), =($2, 'def'), =($2, 'ghi'))])
Project(dim1=[$2])
Aggregate(group=[{0}],EXPR$1=[COUNT()])
toDruidQuery()
Future work
● Druid features not supported in Druid SQL (as of 0.12)
○ Multi-value dimensions
○ Spatial filters
○ Theta sketches (approx. set intersection, differences)
● JOIN related
○ Allow users to write lookups as a SQL JOIN
○ Allow JOINs between two Druid datasources
● Others: SQL window functions, SQL UNION, GROUPING SETS
25
Full talk slides
https://www.slideshare.net/gianmerlino/nosql-no-more-sql-on-druid-with-apache-calcite
Video should be available in 1–2 months.
26
Fin
Thank you!
And, we’re hiring: https://imply.io/careers
27

More Related Content

What's hot

Programmatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & DruidProgrammatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & DruidCharles Allen
 
Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterImply
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Imply
 
Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Julien Le Dem
 
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...Imply
 
Self Service Analytics at Twitch
Self Service Analytics at TwitchSelf Service Analytics at Twitch
Self Service Analytics at TwitchImply
 
Benchmarking Apache Druid
Benchmarking Apache Druid Benchmarking Apache Druid
Benchmarking Apache Druid Matt Sarrel
 
Building Data Applications with Apache Druid
Building Data Applications with Apache DruidBuilding Data Applications with Apache Druid
Building Data Applications with Apache DruidImply
 
Apache Druid Design and Future prospect
Apache Druid Design and Future prospectApache Druid Design and Future prospect
Apache Druid Design and Future prospectc-bslim
 
Data Analytics with Druid
Data Analytics with DruidData Analytics with Druid
Data Analytics with DruidYousun Jeong
 
Apache Druid®: A Dance of Distributed Processes
 Apache Druid®: A Dance of Distributed Processes Apache Druid®: A Dance of Distributed Processes
Apache Druid®: A Dance of Distributed ProcessesImply
 
Druid in Spot Instances
Druid in Spot InstancesDruid in Spot Instances
Druid in Spot InstancesImply
 
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015NoSQLmatters
 
August meetup - All about Apache Druid
August meetup - All about Apache Druid August meetup - All about Apache Druid
August meetup - All about Apache Druid Imply
 
BIG DATA: From mammoth to elephant
BIG DATA: From mammoth to elephantBIG DATA: From mammoth to elephant
BIG DATA: From mammoth to elephantRoman Nikitchenko
 
Splunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operatorSplunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operatorImply
 
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and BotsHow TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and BotsImply
 
Building a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache DruidBuilding a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache DruidImply
 

What's hot (20)

Programmatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & DruidProgrammatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & Druid
 
Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at Twitter
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
 
Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020
 
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
 
Self Service Analytics at Twitch
Self Service Analytics at TwitchSelf Service Analytics at Twitch
Self Service Analytics at Twitch
 
Benchmarking Apache Druid
Benchmarking Apache Druid Benchmarking Apache Druid
Benchmarking Apache Druid
 
Building Data Applications with Apache Druid
Building Data Applications with Apache DruidBuilding Data Applications with Apache Druid
Building Data Applications with Apache Druid
 
Apache Druid Design and Future prospect
Apache Druid Design and Future prospectApache Druid Design and Future prospect
Apache Druid Design and Future prospect
 
Druid
DruidDruid
Druid
 
Data Analytics with Druid
Data Analytics with DruidData Analytics with Druid
Data Analytics with Druid
 
Apache Druid®: A Dance of Distributed Processes
 Apache Druid®: A Dance of Distributed Processes Apache Druid®: A Dance of Distributed Processes
Apache Druid®: A Dance of Distributed Processes
 
Druid in Spot Instances
Druid in Spot InstancesDruid in Spot Instances
Druid in Spot Instances
 
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
 
August meetup - All about Apache Druid
August meetup - All about Apache Druid August meetup - All about Apache Druid
August meetup - All about Apache Druid
 
BIG DATA: From mammoth to elephant
BIG DATA: From mammoth to elephantBIG DATA: From mammoth to elephant
BIG DATA: From mammoth to elephant
 
Splunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operatorSplunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operator
 
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and BotsHow TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
 
Building a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache DruidBuilding a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache Druid
 
druid.io
druid.iodruid.io
druid.io
 

Similar to Three Talks on Druid and SQL in 20 Minutes

NoSQL no more: SQL on Druid with Apache Calcite
NoSQL no more: SQL on Druid with Apache CalciteNoSQL no more: SQL on Druid with Apache Calcite
NoSQL no more: SQL on Druid with Apache Calcitegianmerlino
 
Level 101 for Presto: What is PrestoDB?
Level 101 for Presto: What is PrestoDB?Level 101 for Presto: What is PrestoDB?
Level 101 for Presto: What is PrestoDB?Ali LeClerc
 
Pivotal OSS meetup - MADlib and PivotalR
Pivotal OSS meetup - MADlib and PivotalRPivotal OSS meetup - MADlib and PivotalR
Pivotal OSS meetup - MADlib and PivotalRgo-pivotal
 
MySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the WireMySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the WireSimon J Mudd
 
Web-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchWeb-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchEdward Capriolo
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Piotr Findeisen
 
Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016Holden Karau
 
Hadoop Spark - Reuniao SouJava 12/04/2014
Hadoop Spark - Reuniao SouJava 12/04/2014Hadoop Spark - Reuniao SouJava 12/04/2014
Hadoop Spark - Reuniao SouJava 12/04/2014soujavajug
 
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelLaskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelGarindra Prahandono
 
Python And The MySQL X DevAPI - PyCaribbean 2019
Python And The MySQL X DevAPI - PyCaribbean 2019Python And The MySQL X DevAPI - PyCaribbean 2019
Python And The MySQL X DevAPI - PyCaribbean 2019Dave Stokes
 
About VisualDNA Architecture @ Rubyslava 2014
About VisualDNA Architecture @ Rubyslava 2014About VisualDNA Architecture @ Rubyslava 2014
About VisualDNA Architecture @ Rubyslava 2014Michal Harish
 
MOUG17 Keynote: Oracle OpenWorld Major Announcements
MOUG17 Keynote: Oracle OpenWorld Major AnnouncementsMOUG17 Keynote: Oracle OpenWorld Major Announcements
MOUG17 Keynote: Oracle OpenWorld Major AnnouncementsMonica Li
 
From a monolith to microservices + REST: The evolution of LinkedIn's architec...
From a monolith to microservices + REST: The evolution of LinkedIn's architec...From a monolith to microservices + REST: The evolution of LinkedIn's architec...
From a monolith to microservices + REST: The evolution of LinkedIn's architec...Karan Parikh
 
Autogenerate Awesome GraphQL Documentation with SpectaQL
Autogenerate Awesome GraphQL Documentation with SpectaQLAutogenerate Awesome GraphQL Documentation with SpectaQL
Autogenerate Awesome GraphQL Documentation with SpectaQLNordic APIs
 
Austin bdug 2011_01_27_small_and_big_data
Austin bdug 2011_01_27_small_and_big_dataAustin bdug 2011_01_27_small_and_big_data
Austin bdug 2011_01_27_small_and_big_dataAlex Pinkin
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkRahul Kumar
 
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
Anatomy of Data Frame API :  A deep dive into Spark Data Frame APIAnatomy of Data Frame API :  A deep dive into Spark Data Frame API
Anatomy of Data Frame API : A deep dive into Spark Data Frame APIdatamantra
 

Similar to Three Talks on Druid and SQL in 20 Minutes (20)

NoSQL no more: SQL on Druid with Apache Calcite
NoSQL no more: SQL on Druid with Apache CalciteNoSQL no more: SQL on Druid with Apache Calcite
NoSQL no more: SQL on Druid with Apache Calcite
 
Level 101 for Presto: What is PrestoDB?
Level 101 for Presto: What is PrestoDB?Level 101 for Presto: What is PrestoDB?
Level 101 for Presto: What is PrestoDB?
 
Pivotal OSS meetup - MADlib and PivotalR
Pivotal OSS meetup - MADlib and PivotalRPivotal OSS meetup - MADlib and PivotalR
Pivotal OSS meetup - MADlib and PivotalR
 
MySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the WireMySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the Wire
 
Web-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchWeb-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batch
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020
 
Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016
 
Hadoop Spark - Reuniao SouJava 12/04/2014
Hadoop Spark - Reuniao SouJava 12/04/2014Hadoop Spark - Reuniao SouJava 12/04/2014
Hadoop Spark - Reuniao SouJava 12/04/2014
 
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelLaskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
 
Python And The MySQL X DevAPI - PyCaribbean 2019
Python And The MySQL X DevAPI - PyCaribbean 2019Python And The MySQL X DevAPI - PyCaribbean 2019
Python And The MySQL X DevAPI - PyCaribbean 2019
 
About VisualDNA Architecture @ Rubyslava 2014
About VisualDNA Architecture @ Rubyslava 2014About VisualDNA Architecture @ Rubyslava 2014
About VisualDNA Architecture @ Rubyslava 2014
 
MOUG17 Keynote: Oracle OpenWorld Major Announcements
MOUG17 Keynote: Oracle OpenWorld Major AnnouncementsMOUG17 Keynote: Oracle OpenWorld Major Announcements
MOUG17 Keynote: Oracle OpenWorld Major Announcements
 
From a monolith to microservices + REST: The evolution of LinkedIn's architec...
From a monolith to microservices + REST: The evolution of LinkedIn's architec...From a monolith to microservices + REST: The evolution of LinkedIn's architec...
From a monolith to microservices + REST: The evolution of LinkedIn's architec...
 
Autogenerate Awesome GraphQL Documentation with SpectaQL
Autogenerate Awesome GraphQL Documentation with SpectaQLAutogenerate Awesome GraphQL Documentation with SpectaQL
Autogenerate Awesome GraphQL Documentation with SpectaQL
 
Austin bdug 2011_01_27_small_and_big_data
Austin bdug 2011_01_27_small_and_big_dataAustin bdug 2011_01_27_small_and_big_data
Austin bdug 2011_01_27_small_and_big_data
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
Anatomy of Data Frame API :  A deep dive into Spark Data Frame APIAnatomy of Data Frame API :  A deep dive into Spark Data Frame API
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
 
Workflow Engines + Luigi
Workflow Engines + LuigiWorkflow Engines + Luigi
Workflow Engines + Luigi
 
MapReduce and Hadoop
MapReduce and HadoopMapReduce and Hadoop
MapReduce and Hadoop
 

Recently uploaded

Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 

Three Talks on Druid and SQL in 20 Minutes

  • 1. three talks in 20 minutes Gian Merlino gian@imply.io
  • 2. Who am I? Gian Merlino Committer & PMC member on Cofounder at 2
  • 3. Three talks in 15 minutes ● Druid 0.12.0… and beyond!! ● Apache Druid (incubating) ● NoSQL no more: SQL on Druid with Apache Calcite 3
  • 5. Druid 0.12.0 ● Kafka indexing: incremental publishing ● Kafka indexing: partition multiplexing ● Prioritized locking ● New quantiles sketch ● Parser de-batching ● SQL improvements: performance, expressivity 5
  • 6. And beyond! ● Parallel loading of data files without Hadoop ● Indexing errors and statistics APIs ● Automatic compaction ● Better integer compression ● Subtotals, SQL “grouping sets” ● SQL compatible null handling ● Vectorized query engine ● Garbage-free expression engine 6
  • 8. Apache Druid (incubating) ● Started 2018-02-28 ● Migration logistics on dev list ● Join our new mailing lists! ● Still figuring out source repos, website, etc. ● Druid 0.12.0 is not an Apache release ● Maybe 0.13.0 will be? 8
  • 9. Apache FAQ What does incubation mean? 9
  • 10. Apache FAQ How long does incubation take? 10
  • 11. Apache FAQ Will we keep using GitHub? 11
  • 12. Apache FAQ How will releases work? 12
  • 13. NoSQL no more SQL on Druid with Apache Calcite
  • 14. What is NoSQL? “There's no strong definition of the concept out there, no trademarks, no standard group, not even a manifesto.” 14 Source: https://martinfowler.com/bliki/NosqlDefinition.html
  • 15. What is NoSQL? ● Not using the relational model (nor the SQL language) ● Open source ● Designed to run on large clusters ● Based on the needs of 21st century web properties ● No schema, allowing fields to be added to any record without controls 15 Source: https://martinfowler.com/bliki/NosqlDefinition.html
  • 16. Druid and the Relational Model Is avoiding the SQL language and relational model really a good thing? 16
  • 17. Druid and the Relational Model ● Datasources are like tables ○ Druid “lookups” apply to a common join use case ○ Big, flat tables are common in SQL databases anyway, when analytical performance is critical ● Benefits of offering SQL ○ Developers and analysts know it ○ Integration with 3rd party apps 17
  • 19. Apache Calcite ● SQL parser ● Query optimizer ● Query interpreter ● JDBC server (Avatica) 19
  • 20. Apache Calcite ● Widely used ○ Druid ○ Hive ○ Storm ○ Samza ○ Drill ○ Phoenix ○ Flink 20
  • 21. Apache Calcite 21 SQL SqlNode Parse tree RelNode Relational operator tree RelNode Optimized in target calling convention
  • 22. Relational operators SELECT dim1, COUNT(*) FROM druid.foo WHERE dim1 IN ('abc', 'def', 'ghi') GROUP BY dim1 22 LogicalAggregate(group=[{0}], EXPR$1=[COUNT()]) LogicalProject(dim1=[$2]) LogicalFilter(condition=[OR(=($2, 'abc'), =($2, 'def'), =($2, 'ghi'))]) LogicalTableScan(table=[[druid, foo]])
  • 23. SQL to Native translation 23 PartialDruidQuery Scan Filter Project Aggregate Filter Project Sort Druid’s query execution pipeline
  • 24. SQL to Native translation 24 PartialDruidQuery Filter Project Sort { "queryType" : "groupBy", "dataSource" : “foo”, "filter" : { "type" : "in", "dimension" : "dim1", "values" : [ "abc", "def", "ghi" ] }, "dimensions" : [ “dim1” ], "aggregations" : [ { "type" : "count", "name" : "a0" } ], } Scan(table=[[druid, foo]]) Filter(condition=[OR(=($2, 'abc'), =($2, 'def'), =($2, 'ghi'))]) Project(dim1=[$2]) Aggregate(group=[{0}],EXPR$1=[COUNT()]) toDruidQuery()
  • 25. Future work ● Druid features not supported in Druid SQL (as of 0.12) ○ Multi-value dimensions ○ Spatial filters ○ Theta sketches (approx. set intersection, differences) ● JOIN related ○ Allow users to write lookups as a SQL JOIN ○ Allow JOINs between two Druid datasources ● Others: SQL window functions, SQL UNION, GROUPING SETS 25
  • 27. Fin Thank you! And, we’re hiring: https://imply.io/careers 27