SlideShare a Scribd company logo
1 of 37
Download to read offline
SQL Now!
NoSQL Now!
San Jose, California
August, 2013
Julian Hyde @julianhyde
About me
• Database: Oracle, Broadbase
• Streaming query: SQLstream
• Open source BI: Mondrian / Pentaho
• Open source SQL: LucidDB, Optiq
• Contributor to Apache Drill, Cascading
Lingual
http://manning.com/back/
http://manning.com/back/
45% off code
mlnosql13
3 things
1. Modern data challenges need both NoSQL
and SQL.
2. Optiq is not a database (and this is a Good
Thing).
3. How to use Optiq with your data.
A disturbance in the Force
SQL has been around a
very, very long time
NoSQL: trade-offs vs
SQL
Gain Lose
Scale out Optimization
Flexibility Data independence
Productivity Central control
Purchase cost Tool integration
SQL: key features
• Inspired by Codd’s 1970 “relational
database” paper
• Semantics based on relational operators
(scan, filter, project, join, agg, union, sort)
• All implementations transform queries
• Optimizer rewrites queries onto available
data structures and operators
Optiq
Optiq design principles
1. Do the stuff that SQL does well
2. Let other systems do what they do well
Conventional DB architecture
Optiq architecture
Examples
Example #1: CSV
• Uses CSV adapter (optiq-csv)
• Demo using sqlline
• Easy to run this for yourself:
$ git clone https://github.com/julianhyde/optiq-csv
$ cd optiq-csv
$ mvn install
$ ./sqlline
!connect jdbc:optiq:model=target/test-classes/model.json admin admin
!tables
!describe emps
SELECT * FROM emps;
EXPLAIN PLAN FOR SELECT * FROM emps;
SELECT depts.name, count(*)
FROM emps JOIN depts USING (deptno)
GROUP BY depts.name;
model.json:
{
version: '1.0',
defaultSchema: 'SALES',
schemas: [
{
name: 'SALES',
type: 'custom',
factory: 'net.hydromatic.optiq.impl.csv.CsvSchemaFactory',
operand: {
directory: 'target/test-classes/sales'
}
}
]
}
More adapters
Adapters Embedded Planned
CSV Cascading (Lingual) HBase (Phoenix)
JDBC Apache Drill Spark
MongoDB Cassandra
Splunk Mondrian
linq4j
SQL on NoSQL
• No schema or schema-on-read
• Optiq MAP columns, dynamic schema
• Nested data (Drill, MongoDB)
• Optiq ARRAY columns
• Source system optimized for transactions &
focused reads, not analytic queries
• Optiq cache and materializations
• Mongo’s standard “zips” data set with all US
zip codes
• Raw ZIPS table with _MAP column
• ZIPS view extracts named fields, including
nested latitude and longitude, and converts
them to SQL types
Example #2: MongoDB
Splunk
• NoSQL database
• Every log file in the enterprise
• A single “table”
• A record for every line in every log file
• A column for every field that exists in
any log file
• No schema
Example #3:
Splunk + MySQL
SELECT p.product_name,
COUNT(*) AS c
FROM splunk.splunk AS s
JOIN mysql.products AS p
ON s.product_id = p.product_id
WHERE s.action = 'purchase'
GROUP BY p.product_name
ORDER BY c DESC
Expression tree
Optimized tree
Analytics
Interactive analytics on
NoSQL?
• NoSQL operational DB (e.g. HBase,
MongoDB, Cassandra)
• Analytic queries aggregate over full scan
• Speed-of-thought response (< 5 seconds)
• Data freshness (< 10 minutes)
Simple analytics
problem
• 100M U.S. census records
• 1KB each record, 100GB total
• 4 SATA3 disks, total 1.2GB/s
• How to count all records in under 5s?
Simple analytics
problem
• 100M U.S. census records
• 1KB each record, 100GB total
• 4 SATA3 disks, total 1.2GB/s
• How to count all records in under 5s?
Simple analytics
problem
• 100M U.S. census records
• 1KB each record, 100GB total
• 4 SATA3 disks, total 1.2GB/s
• How to count all records in under 5s?
• Not possible?! It takes 80s just to read the
data.
Solution: Cheat!
Solution: Cheat!
• Compress data
• Column-oriented storage
• Store data in sorted order
• Put data in memory
• Cache previous query results
• Pre-compute (materialize) aggregates
Mondrian on Optiq on
NoSQL (under construction)
Hybrid analytic
architecture
1. NoSQL source system
2. Access via Optiq SQL
3. Optiq rewrites queries to use materialized data
(e.g. aggregate tables)
4. Cached results are treated as “in-memory
tables”
5. Materializations offline dynamically as underlying
data changes, and go online as they are refreshed
Summary
3 things (reprise)
1. SQL allows you to reorganize your data and
optimize your queries—while still using your
NoSQL database for what it does best.
2. Optiq is not a database. It lets you create
very powerful federated data architectures.
3. Access your data using Optiq adapters.
Write schemas in JSON or use schema SPI.
Connect via JDBC.
Questions?
Thanks!
@julianhyde
optiq https://github.com/julianhyde/optiq
optiq-csv https://github.com/julianhyde/optiq-csv
drill http://incubator.apache.org/drill/
lingual http://www.cascading.org/lingual/
mondrian http://mondrian.pentaho.com
blog http://julianhyde.blogspot.com/
book http://manning.com/back/ 45% off code mlnosql13

More Related Content

What's hot

ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smart
Evans Ye
 

What's hot (20)

Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smart
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
How to integrate Splunk with any data solution
How to integrate Splunk with any data solutionHow to integrate Splunk with any data solution
How to integrate Splunk with any data solution
 
SQL for NoSQL and how Apache Calcite can help
SQL for NoSQL and how  Apache Calcite can helpSQL for NoSQL and how  Apache Calcite can help
SQL for NoSQL and how Apache Calcite can help
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
 
Real-World NoSQL Schema Design
Real-World NoSQL Schema DesignReal-World NoSQL Schema Design
Real-World NoSQL Schema Design
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
SQL on everything, in memory
SQL on everything, in memorySQL on everything, in memory
SQL on everything, in memory
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
 
Drill / SQL / Optiq
Drill / SQL / OptiqDrill / SQL / Optiq
Drill / SQL / Optiq
 
Mondrian update (Pentaho community meetup 2012, Amsterdam)
Mondrian update (Pentaho community meetup 2012, Amsterdam)Mondrian update (Pentaho community meetup 2012, Amsterdam)
Mondrian update (Pentaho community meetup 2012, Amsterdam)
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache Calcite
 
Spark sql
Spark sqlSpark sql
Spark sql
 
Optiq: a SQL front-end for everything
Optiq: a SQL front-end for everythingOptiq: a SQL front-end for everything
Optiq: a SQL front-end for everything
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 

Similar to SQL Now! How Optiq brings the best of SQL to NoSQL data.

Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
Minsk MongoDB User Group
 
Searching Billions of Product Logs in Real Time (Use Case)
Searching Billions of Product Logs in Real Time (Use Case)Searching Billions of Product Logs in Real Time (Use Case)
Searching Billions of Product Logs in Real Time (Use Case)
Ryan Tabora
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
DATAVERSITY
 
2008 2086 Gangler
2008 2086 Gangler2008 2086 Gangler
2008 2086 Gangler
Secure-24
 
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Lviv Startup Club
 
Optimizing MySQL for Cascade Server
Optimizing MySQL for Cascade ServerOptimizing MySQL for Cascade Server
Optimizing MySQL for Cascade Server
hannonhill
 

Similar to SQL Now! How Optiq brings the best of SQL to NoSQL data. (20)

Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at night
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
 
Searching Billions of Product Logs in Real Time (Use Case)
Searching Billions of Product Logs in Real Time (Use Case)Searching Billions of Product Logs in Real Time (Use Case)
Searching Billions of Product Logs in Real Time (Use Case)
 
Wmware NoSQL
Wmware NoSQLWmware NoSQL
Wmware NoSQL
 
Oracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_databaseOracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_database
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
2008 2086 Gangler
2008 2086 Gangler2008 2086 Gangler
2008 2086 Gangler
 
NoSQL Overview
NoSQL OverviewNoSQL Overview
NoSQL Overview
 
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
Swiss Big Data User Group - Introduction to Apache Drill
Swiss Big Data User Group - Introduction to Apache DrillSwiss Big Data User Group - Introduction to Apache Drill
Swiss Big Data User Group - Introduction to Apache Drill
 
Elasticsearch 5.0
Elasticsearch 5.0Elasticsearch 5.0
Elasticsearch 5.0
 
Devnexus 2018
Devnexus 2018Devnexus 2018
Devnexus 2018
 
Optimizing MySQL for Cascade Server
Optimizing MySQL for Cascade ServerOptimizing MySQL for Cascade Server
Optimizing MySQL for Cascade Server
 
Object Relational Database Management System
Object Relational Database Management SystemObject Relational Database Management System
Object Relational Database Management System
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better Together
 

More from Julian Hyde

More from Julian Hyde (20)

Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using Calcite
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQL
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming language
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its Community
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're Incubating
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databases
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and Fast
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

SQL Now! How Optiq brings the best of SQL to NoSQL data.

  • 1. SQL Now! NoSQL Now! San Jose, California August, 2013 Julian Hyde @julianhyde
  • 2. About me • Database: Oracle, Broadbase • Streaming query: SQLstream • Open source BI: Mondrian / Pentaho • Open source SQL: LucidDB, Optiq • Contributor to Apache Drill, Cascading Lingual
  • 5. 3 things 1. Modern data challenges need both NoSQL and SQL. 2. Optiq is not a database (and this is a Good Thing). 3. How to use Optiq with your data.
  • 6. A disturbance in the Force
  • 7. SQL has been around a very, very long time
  • 8. NoSQL: trade-offs vs SQL Gain Lose Scale out Optimization Flexibility Data independence Productivity Central control Purchase cost Tool integration
  • 9. SQL: key features • Inspired by Codd’s 1970 “relational database” paper • Semantics based on relational operators (scan, filter, project, join, agg, union, sort) • All implementations transform queries • Optimizer rewrites queries onto available data structures and operators
  • 10. Optiq
  • 11. Optiq design principles 1. Do the stuff that SQL does well 2. Let other systems do what they do well
  • 15. Example #1: CSV • Uses CSV adapter (optiq-csv) • Demo using sqlline • Easy to run this for yourself: $ git clone https://github.com/julianhyde/optiq-csv $ cd optiq-csv $ mvn install $ ./sqlline
  • 16. !connect jdbc:optiq:model=target/test-classes/model.json admin admin !tables !describe emps SELECT * FROM emps; EXPLAIN PLAN FOR SELECT * FROM emps; SELECT depts.name, count(*) FROM emps JOIN depts USING (deptno) GROUP BY depts.name; model.json: { version: '1.0', defaultSchema: 'SALES', schemas: [ { name: 'SALES', type: 'custom', factory: 'net.hydromatic.optiq.impl.csv.CsvSchemaFactory', operand: { directory: 'target/test-classes/sales' } } ] }
  • 17. More adapters Adapters Embedded Planned CSV Cascading (Lingual) HBase (Phoenix) JDBC Apache Drill Spark MongoDB Cassandra Splunk Mondrian linq4j
  • 18. SQL on NoSQL • No schema or schema-on-read • Optiq MAP columns, dynamic schema • Nested data (Drill, MongoDB) • Optiq ARRAY columns • Source system optimized for transactions & focused reads, not analytic queries • Optiq cache and materializations
  • 19. • Mongo’s standard “zips” data set with all US zip codes • Raw ZIPS table with _MAP column • ZIPS view extracts named fields, including nested latitude and longitude, and converts them to SQL types Example #2: MongoDB
  • 20. Splunk • NoSQL database • Every log file in the enterprise • A single “table” • A record for every line in every log file • A column for every field that exists in any log file • No schema
  • 21. Example #3: Splunk + MySQL SELECT p.product_name, COUNT(*) AS c FROM splunk.splunk AS s JOIN mysql.products AS p ON s.product_id = p.product_id WHERE s.action = 'purchase' GROUP BY p.product_name ORDER BY c DESC
  • 25. Interactive analytics on NoSQL? • NoSQL operational DB (e.g. HBase, MongoDB, Cassandra) • Analytic queries aggregate over full scan • Speed-of-thought response (< 5 seconds) • Data freshness (< 10 minutes)
  • 26. Simple analytics problem • 100M U.S. census records • 1KB each record, 100GB total • 4 SATA3 disks, total 1.2GB/s • How to count all records in under 5s?
  • 27. Simple analytics problem • 100M U.S. census records • 1KB each record, 100GB total • 4 SATA3 disks, total 1.2GB/s • How to count all records in under 5s?
  • 28. Simple analytics problem • 100M U.S. census records • 1KB each record, 100GB total • 4 SATA3 disks, total 1.2GB/s • How to count all records in under 5s? • Not possible?! It takes 80s just to read the data.
  • 30. Solution: Cheat! • Compress data • Column-oriented storage • Store data in sorted order • Put data in memory • Cache previous query results • Pre-compute (materialize) aggregates
  • 31.
  • 32. Mondrian on Optiq on NoSQL (under construction)
  • 33. Hybrid analytic architecture 1. NoSQL source system 2. Access via Optiq SQL 3. Optiq rewrites queries to use materialized data (e.g. aggregate tables) 4. Cached results are treated as “in-memory tables” 5. Materializations offline dynamically as underlying data changes, and go online as they are refreshed
  • 35. 3 things (reprise) 1. SQL allows you to reorganize your data and optimize your queries—while still using your NoSQL database for what it does best. 2. Optiq is not a database. It lets you create very powerful federated data architectures. 3. Access your data using Optiq adapters. Write schemas in JSON or use schema SPI. Connect via JDBC.
  • 37. Thanks! @julianhyde optiq https://github.com/julianhyde/optiq optiq-csv https://github.com/julianhyde/optiq-csv drill http://incubator.apache.org/drill/ lingual http://www.cascading.org/lingual/ mondrian http://mondrian.pentaho.com blog http://julianhyde.blogspot.com/ book http://manning.com/back/ 45% off code mlnosql13