SQL Now! How Optiq brings the best of SQL to NoSQL data.

SQL Now!
NoSQL Now!
San Jose, California
August, 2013
Julian Hyde @julianhyde

About me
• Database: Oracle, Broadbase
• Streaming query: SQLstream
• Open source BI: Mondrian / Pentaho
• Open source SQL: LucidDB, Optiq
• Contributor to Apache Drill, Cascading
Lingual

http://manning.com/back/
45% off code
mlnosql13

3 things
1. Modern data challenges need both NoSQL
and SQL.
2. Optiq is not a database (and this is a Good
Thing).
3. How to use Optiq with your data.

SQL has been around a
very, very long time

NoSQL: trade-offs vs
SQL
Gain Lose
Scale out Optimization
Flexibility Data independence
Productivity Central control
Purchase cost Tool integration

SQL: key features
• Inspired by Codd’s 1970 “relational
database” paper
• Semantics based on relational operators
(scan, ﬁlter, project, join, agg, union, sort)
• All implementations transform queries
• Optimizer rewrites queries onto available
data structures and operators

Optiq design principles
1. Do the stuff that SQL does well
2. Let other systems do what they do well

Example #1: CSV
• Uses CSV adapter (optiq-csv)
• Demo using sqlline
• Easy to run this for yourself:
$ git clone https://github.com/julianhyde/optiq-csv
$ cd optiq-csv
$ mvn install
$ ./sqlline

!connect jdbc:optiq:model=target/test-classes/model.json admin admin
!tables
!describe emps
SELECT * FROM emps;
EXPLAIN PLAN FOR SELECT * FROM emps;
SELECT depts.name, count(*)
FROM emps JOIN depts USING (deptno)
GROUP BY depts.name;
model.json:
{
version: '1.0',
defaultSchema: 'SALES',
schemas: [
{
name: 'SALES',
type: 'custom',
factory: 'net.hydromatic.optiq.impl.csv.CsvSchemaFactory',
operand: {
directory: 'target/test-classes/sales'
}
}
]
}

More adapters
Adapters Embedded Planned
CSV Cascading (Lingual) HBase (Phoenix)
JDBC Apache Drill Spark
MongoDB Cassandra
Splunk Mondrian
linq4j

SQL on NoSQL
• No schema or schema-on-read
• Optiq MAP columns, dynamic schema
• Nested data (Drill, MongoDB)
• Optiq ARRAY columns
• Source system optimized for transactions &
focused reads, not analytic queries
• Optiq cache and materializations

• Mongo’s standard “zips” data set with all US
zip codes
• Raw ZIPS table with _MAP column
• ZIPS view extracts named ﬁelds, including
nested latitude and longitude, and converts
them to SQL types
Example #2: MongoDB

Splunk
• NoSQL database
• Every log file in the enterprise
• A single “table”
• A record for every line in every log file
• A column for every field that exists in
any log file
• No schema

Example #3:
Splunk + MySQL
SELECT p.product_name,
COUNT(*) AS c
FROM splunk.splunk AS s
JOIN mysql.products AS p
ON s.product_id = p.product_id
WHERE s.action = 'purchase'
GROUP BY p.product_name
ORDER BY c DESC

Interactive analytics on
NoSQL?
• NoSQL operational DB (e.g. HBase,
MongoDB, Cassandra)
• Analytic queries aggregate over full scan
• Speed-of-thought response (< 5 seconds)
• Data freshness (< 10 minutes)

Simple analytics
problem
• 100M U.S. census records
• 1KB each record, 100GB total
• 4 SATA3 disks, total 1.2GB/s
• How to count all records in under 5s?

Simple analytics
problem
• 100M U.S. census records
• 1KB each record, 100GB total
• 4 SATA3 disks, total 1.2GB/s
• How to count all records in under 5s?
• Not possible?! It takes 80s just to read the
data.

Solution: Cheat!
• Compress data
• Column-oriented storage
• Store data in sorted order
• Put data in memory
• Cache previous query results
• Pre-compute (materialize) aggregates

Mondrian on Optiq on
NoSQL (under construction)

Hybrid analytic
architecture
1. NoSQL source system
2. Access via Optiq SQL
3. Optiq rewrites queries to use materialized data
(e.g. aggregate tables)
4. Cached results are treated as “in-memory
tables”
5. Materializations ofﬂine dynamically as underlying
data changes, and go online as they are refreshed

3 things (reprise)
1. SQL allows you to reorganize your data and
optimize your queries—while still using your
NoSQL database for what it does best.
2. Optiq is not a database. It lets you create
very powerful federated data architectures.
3. Access your data using Optiq adapters.
Write schemas in JSON or use schema SPI.
Connect via JDBC.

Thanks!
@julianhyde
optiq https://github.com/julianhyde/optiq
optiq-csv https://github.com/julianhyde/optiq-csv
drill http://incubator.apache.org/drill/
lingual http://www.cascading.org/lingual/
mondrian http://mondrian.pentaho.com
blog http://julianhyde.blogspot.com/
book http://manning.com/back/ 45% off code mlnosql13

SQL Now! How Optiq brings the best of SQL to NoSQL data.

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to SQL Now! How Optiq brings the best of SQL to NoSQL data.

Similar to SQL Now! How Optiq brings the best of SQL to NoSQL data. (20)

More from Julian Hyde

More from Julian Hyde (20)

Recently uploaded

Recently uploaded (20)

SQL Now! How Optiq brings the best of SQL to NoSQL data.