SlideShare a Scribd company logo
1 of 45
SQL++ FOR BIG DATA
Same Language, More Power
Matthew D. Groves
2
SQL, for the win
https://insights.stackoverflow.com/survey/2019
01/
02/
03/
04/
SQL & Relational
NoSQL
Analytics & Reporting
Summary & More Resources
AGENDA
SQL & RELATIONAL
1
5
• Relational
• E.F. Codd invented the relational model
• Alpha
• SQL
• Created by Don Chamberlin & Raymond Boyce
• Designed to be English-friendly
• "SQL" and "relational" are now synonyms
Relational and SQL
6
• Impedance mismatch
• Scaling
• Inflexibility
Criticisms of SQL/relational
7
Impedance mismatch
ID Username DateCreated
1 mgroves 2019-06-13
2 agroves 2019-06-14
. . .
. . .
CartID Item Price Qty
1 hat 12.99 1
1 socks 11.99 1
2 t-shirt 15.99 1
. . . .
. . . .
public class ShoppingCart
{
public int Id;
public string Username;
public List<Items> Items;
}
ShoppingCart
ShoppingCartItems
8
Scaling
Vertical Horizontal
9
Inflexibility
Billing
ConnectionsPurchases
Contacts
Customer
10
• A relational database may be…
Disclaimer!
NOSQL / SQL++
2
12
JSON data is NoSQL data
13
Example 1
{
"callsign": "UNITED",
"country": "United States",
"name": "United Airlines",
"type": "airline"
}
document key: airline_5209
14
Example 2
document key: route_55758
{
"airline": "UA",
"airlineid": "airline_5209",
"destinationairport": "ORD",
"distance": 1050.394306634423,
"equipment": "ER4 ERJ",
"schedule": [
{ "day": 0, "flight": "UA479", "utc": "15:05:00" },
{ "day": 1, "flight": "UA842", "utc": "02:27:00" },
{ "day": 1, "flight": "UA252", "utc": "03:00:00" },
// ... etc ...
],
"sourceairport": "CMH",
"stops": 0,
"type": "route"
}
15
• Get by key
• Set by key
• Delete by key
• Other "operational" query
NoSQL basic operations
16
Problems:
1. Large amounts of data
2. Queries against the data could impact
operations
What about reporting and analytics?
ANALYTICS &
REPORTING3
18
Operational vs Analytics vs Operational Analytics
19
Fewer queries
Operational Analytics workload
Adhoc
Could be complex Performance is nice-to-have
20
How are operational analytics done?
21
¯_(ツ)_/¯
Answer 1
22
Answer 2: Export to relational
Data
ETL
SQL
23
Answer 3: Hadoop?
https://medium.com/@ylashin/big-data-using-hdinsight-a-journey-in-the-zoo-ecosystem-c78b913a5ed9
24
Answer 4: SQL++
25
SQL Example
ID foo bar baz
1 matt groves qux
2 ali groves notqux
3 emma groves notqux
mytable
SELECT foo, bar
FROM mytable
WHERE baz = 'qux'
26
SQL++ Example
key: 1
{
"foo" : "matt",
"bar" : "groves",
"baz" : "qux"
}
key: 2
{
"foo" : "ali",
"bar" : "groves",
"baz" : "notqux"
}
key: 3
{
"foo" : "emma",
"bar" : "groves",
"baz" : "notqux"
}
mybucket
SELECT foo, bar
FROM mybucket
WHERE baz = 'qux'
27
SQL++ Research Project
28
• JOIN
• UNION
• aggregation / GROUP BY
• SELECT
• LET
• LIMIT
• ORDER BY
• etc…
SQL++ is backwards compatible
29
SQL++ has superpowers
30
Superpower: Nested Objects
key 1
{
"name" : "matt",
"address" : {
"street" : "White Rd",
"city" : "Grove City",
"state" : "OH"
}
}
key 2
{
"name" : "emma",
"address" : {
"street" : "High St",
"city" : "Columbus",
"state" : "OH"
}
}
SELECT address.city
FROM myusers
myusers
31
Superpower: arrays
key 1
{
"name" : "matt",
"favoriteFoods" : [
"pizza",
"cheesecake",
"donuts"
]
}
key 2
{
"name" : "emma",
"favoriteFoods" : [
"donuts",
"Lucky Charms",
"chicken"
]
}
SELECT favoriteFoods[1]
FROM myusers
myusers
32
Superpower: UNNEST
key 1
{
"name" : "matt",
"favoriteFoods" : [
"pizza",
"cheesecake",
"donuts"
]
}
SELECT food, u.name
FROM myusers u
UNNEST u.favoriteFoods food;
myusers
[
{
"food": "pizza",
"name": "matt"
},
{
"food": "cheesecake",
"name": "matt"
},
{
"food": "donuts",
"name": "matt"
}
]
33
Superpower: Quantification
key 1
{
"name" : "matt",
"favoriteFoods" : [
"pizza",
"cheesecake",
"donuts"
]
}
key 2
{
"name" : "emma",
"favoriteFoods" : [
"donuts",
"Lucky Charms",
"chicken"
]
}
SELECT u.name
FROM eftest u
WHERE ANY f
IN u.favoriteFoods
SATISFIES f == 'pizza'
END;
myusers
34
• Couchbase
• AsterixDb
• Apache Drill
• Others coming soon?
SQL++ Implementations
35
Implementation 1: Couchbase
SQL++
36
Implementation 2: AsterixDB
37
Implementation 3: Apache Drill
SUMMARY
4
39
No
NoSQL doesn't mean NoSQL anymore
++SQL
40
SQL++ is SQL with JSON Superpowers
41
Minimize your ETL, maximize your SQL skills
ETL
👎
SQL
👍
42
• E.F. Codd original research paper
• http://db.dobo.sk/wp-content/uploads/2015/11/Codd_1970_A_relational_model.pdf
• The Free Lunch is Over
• http://www.gotw.ca/publications/concurrency-ddj.htm
• Original SEQUEL paper
• https://dl.acm.org/citation.cfm?id=811515
Resources: SQL/scaling
43
• UCSD
• http://forward.ucsd.edu/sqlpp.html
• The SQL++ Query Language
• https://arxiv.org/abs/1405.3631
Resources: UCSD Research
44
• Book: SQL++ for SQL Users
• Amazon: https://www.amazon.com/SQL-Users-Tutorial-Don-Chamberlin/dp/0692184503/
• Free PDF: https://resources.couchbase.com/sql_tutorial
• Videos
• NoSQL and SQL++, two sides of the same coin:
https://www.youtube.com/watch?v=KGKiSyJa0-k
• Tech Panel on Query Language Evolution:
https://www.youtube.com/watch?v=LAlDe1w7wxc
Resources: Don Chamberlin
45
•@mgroves
•twitch.tv/matthewdgroves
•forums.couchbase.com
•Find me after this session!
Resources: Me!

More Related Content

Similar to Intro to SQL++ - Detroit Tech Watch - June 2019

Spark and MongoDB
Spark and MongoDBSpark and MongoDB
Spark and MongoDB
Norberto Leite
 
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQL
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQLBringing SQL to NoSQL: Rich, Declarative Query for NoSQL
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQL
Keshav Murthy
 

Similar to Intro to SQL++ - Detroit Tech Watch - June 2019 (20)

Query in Couchbase. N1QL: SQL for JSON
Query in Couchbase.  N1QL: SQL for JSONQuery in Couchbase.  N1QL: SQL for JSON
Query in Couchbase. N1QL: SQL for JSON
 
Elasticsearch for SQL Users
Elasticsearch for SQL UsersElasticsearch for SQL Users
Elasticsearch for SQL Users
 
JSON Data Modeling - GDG Indy - April 2020
JSON Data Modeling - GDG Indy - April 2020JSON Data Modeling - GDG Indy - April 2020
JSON Data Modeling - GDG Indy - April 2020
 
Blazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkBlazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & Spark
 
Mongo db 101 dc group
Mongo db 101 dc groupMongo db 101 dc group
Mongo db 101 dc group
 
Gab document db scaling database
Gab   document db scaling databaseGab   document db scaling database
Gab document db scaling database
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0
 
NoSQL Data Modeling using Couchbase
NoSQL Data Modeling using CouchbaseNoSQL Data Modeling using Couchbase
NoSQL Data Modeling using Couchbase
 
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
 
OSCON 2011 CouchApps
OSCON 2011 CouchAppsOSCON 2011 CouchApps
OSCON 2011 CouchApps
 
Document Model for High Speed Spark Processing
Document Model for High Speed Spark ProcessingDocument Model for High Speed Spark Processing
Document Model for High Speed Spark Processing
 
Spark and MongoDB
Spark and MongoDBSpark and MongoDB
Spark and MongoDB
 
Querying NoSQL with SQL - MIGANG - July 2017
Querying NoSQL with SQL - MIGANG - July 2017Querying NoSQL with SQL - MIGANG - July 2017
Querying NoSQL with SQL - MIGANG - July 2017
 
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
 
When to no sql and when to know sql javaone
When to no sql and when to know sql   javaoneWhen to no sql and when to know sql   javaone
When to no sql and when to know sql javaone
 
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
 
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQL
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQLBringing SQL to NoSQL: Rich, Declarative Query for NoSQL
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQL
 
Introducing N1QL: New SQL Based Query Language for JSON
Introducing N1QL: New SQL Based Query Language for JSONIntroducing N1QL: New SQL Based Query Language for JSON
Introducing N1QL: New SQL Based Query Language for JSON
 

More from Matthew Groves

More from Matthew Groves (20)

CREAM - That Conference Austin - January 2024.pptx
CREAM - That Conference Austin - January 2024.pptxCREAM - That Conference Austin - January 2024.pptx
CREAM - That Conference Austin - January 2024.pptx
 
FluentMigrator - Dayton .NET - July 2023
FluentMigrator - Dayton .NET - July 2023FluentMigrator - Dayton .NET - July 2023
FluentMigrator - Dayton .NET - July 2023
 
Cache Rules Everything Around Me - DevIntersection - December 2022
Cache Rules Everything Around Me - DevIntersection - December 2022Cache Rules Everything Around Me - DevIntersection - December 2022
Cache Rules Everything Around Me - DevIntersection - December 2022
 
Putting the SQL Back in NoSQL - October 2022 - All Things Open
Putting the SQL Back in NoSQL - October 2022 - All Things OpenPutting the SQL Back in NoSQL - October 2022 - All Things Open
Putting the SQL Back in NoSQL - October 2022 - All Things Open
 
Cache Rules Everything Around Me - Momentum - October 2022.pptx
Cache Rules Everything Around Me - Momentum - October 2022.pptxCache Rules Everything Around Me - Momentum - October 2022.pptx
Cache Rules Everything Around Me - Momentum - October 2022.pptx
 
Don't Drop ACID (July 2021)
Don't Drop ACID (July 2021)Don't Drop ACID (July 2021)
Don't Drop ACID (July 2021)
 
Don't Drop ACID - Data Love - April 2021
Don't Drop ACID - Data Love - April 2021Don't Drop ACID - Data Love - April 2021
Don't Drop ACID - Data Love - April 2021
 
Autonomous Microservices - Manning - July 2020
Autonomous Microservices - Manning - July 2020Autonomous Microservices - Manning - July 2020
Autonomous Microservices - Manning - July 2020
 
CONDG April 23 2020 - Baskar Rao - GraphQL
CONDG April 23 2020 - Baskar Rao - GraphQLCONDG April 23 2020 - Baskar Rao - GraphQL
CONDG April 23 2020 - Baskar Rao - GraphQL
 
Background Tasks Without a Separate Service: Hangfire for ASP.NET - KCDC - Ju...
Background Tasks Without a Separate Service: Hangfire for ASP.NET - KCDC - Ju...Background Tasks Without a Separate Service: Hangfire for ASP.NET - KCDC - Ju...
Background Tasks Without a Separate Service: Hangfire for ASP.NET - KCDC - Ju...
 
Autonomous Microservices - CodeMash - January 2019
Autonomous Microservices - CodeMash - January 2019Autonomous Microservices - CodeMash - January 2019
Autonomous Microservices - CodeMash - January 2019
 
5 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 2018
5 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 20185 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 2018
5 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 2018
 
5 NoSQL Options - Toronto - May 2018
5 NoSQL Options - Toronto - May 20185 NoSQL Options - Toronto - May 2018
5 NoSQL Options - Toronto - May 2018
 
Full stack development with node and NoSQL - All Things Open - October 2017
Full stack development with node and NoSQL - All Things Open - October 2017Full stack development with node and NoSQL - All Things Open - October 2017
Full stack development with node and NoSQL - All Things Open - October 2017
 
5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...
5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...
5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...
 
I Have a NoSQL toaster - DC - August 2017
I Have a NoSQL toaster - DC - August 2017I Have a NoSQL toaster - DC - August 2017
I Have a NoSQL toaster - DC - August 2017
 
Querying NoSQL with SQL - KCDC - August 2017
Querying NoSQL with SQL - KCDC - August 2017Querying NoSQL with SQL - KCDC - August 2017
Querying NoSQL with SQL - KCDC - August 2017
 
I Have a NoSQL Toaster - Troy .NET User Group - July 2017
I Have a NoSQL Toaster - Troy .NET User Group - July 2017I Have a NoSQL Toaster - Troy .NET User Group - July 2017
I Have a NoSQL Toaster - Troy .NET User Group - July 2017
 
Json data modeling june 2017 - pittsburgh tech fest
Json data modeling   june 2017 - pittsburgh tech festJson data modeling   june 2017 - pittsburgh tech fest
Json data modeling june 2017 - pittsburgh tech fest
 
I have a NoSQL Toaster - ConnectJS - October 2016
I have a NoSQL Toaster - ConnectJS - October 2016I have a NoSQL Toaster - ConnectJS - October 2016
I have a NoSQL Toaster - ConnectJS - October 2016
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 

Intro to SQL++ - Detroit Tech Watch - June 2019

Editor's Notes

  1. show that SQL is popular with Stack Overflow survey 2019 About the same as it was last year, in the 55-60% Popular doesn't necessarily equal good, of course, but if you look at the top 3, they are all in the "lingua franca" category SQL rules data https://insights.stackoverflow.com/survey/2019
  2. EF Codd did a lot of great theoretical work and research, including the invention of the relational database Interesting quote from his original paper that describes one of the fundamental tradeoffs between relational and non-relational data, which we'll explore today After his initial paper, he designed a language called "Alpha", which was never implemented, but influential
  3. In the database we have 5 pieces of data stored For what is actually 2 shopping carts as they exist in the application We have tools to attempt to deal with this, mainly OR/Ms And they mostly do a good job… mostly
  4. The easiest way to scale a relational database is vertical But this can get expensive and eventually hit a ceiling (The Free Lunch is Over) Horizontal scaling can be cheaper, can scale bigger, but is difficult to do with relational
  5. Rise of agile methodologies "we value responding to change over following a plan" Schema changes A simple change of moving "credit card number" field from customer to a new "billing" table with foreign key That's a simple example, but even that with a large enough database could have huge impact The more complex the schema change and the bigger the database, the more impact it has Which means the more expensive/risky this change will be
  6. I'm not here to convince you that relational is dead! You are working with small data sets (for some definition of small) You are working with simple/rarely changing data structures (for some definition of simple/rarely) You aren't feeling performance / scaling pain (yet) But don't turn off your mind yet. You aren't facing these problems now, but you may face them in the future.
  7. So what if it's not fine?
  8. Isolated pieces of data "Documents" Can be sharded / split between any number of nodes (for some reason when I think of "shards" I think of the crystals that Superman has in the fortress of solitude)
  9. This is a simple example Flat data, you could easily imagine this as a row in a table Notice the document KEY Document database is basically a key/value store. The value is the JSON and the key is some string This may look slightly different from database to database, but they all have a key somewhere.
  10. More complex example The 'schedule' element in relational would be at least one separate table with foreign keys It's all domestic data here No mismatch, easy to scale, no joining required No schema to follow, so I could add other fields TO JUST THIS ONE DOCUMENT if necessary Don't ALWAYS normalize, notice the 'airlineid' field
  11. Other operational query: Map/reduce, Mongo has a javascript-like query language, Couchbase uses SQL for operational queries
  12. Suppose your database is used for the backend of an ecommerce site Everything is humming along nicely, customers are adding items to shopping cart They're making purchases, browsing the catalog with well-known, well-indexed queries Suddenly I come along trying to create a report I run a complicated query or adhoc query that I don't have proper indexing, sizing, tuning for And my query impacts customers: slows them down or worse causes timeouts
  13. Define these terms Talk more about the differences later, when to use each one Operational: means the moment-to-moment data operations and queries that your website needs to function in order to serve customers Analytical: the operations and queries that you need to serve customers in the extreme long run and extreme history – data science/etc Operational analytics: sits between them, closer to real time, perhaps analyzing only the last 6 months or maybe even the last hour of data - dashboards/reporting/trend analysis
  14. - much fewer analysts than customers (hopefully?) - queries are more adhoc in nature - queries might be VERY complex - performance is still nice, but less important.
  15. There are 4 methods that I'm aware of I have experience with most of these
  16. I dunno? We don't really have a plan for this, we don't think about it We have a bunch of Access databases? We copy the operational data when we want to? Or just link to it directly and hope no one screws it up?
  17. export it to a relational database and use SQL - Create/maintain or buy an ETL Impedance mismatch (again!) Size/performance
  18. Hadoop is designed for massive scale, not massive speed. It's analytics, but it's not operational analytics. Using Hadoop and the Hadoop ecosystem is a whole other topic This may be too big of a hammer or too slow of a hammer for operational analytics * answer 3: hadoop or something - still an ETL problem – kafka, sqoop, flume, etc - how do we actually create queries? Pig, Hive, Spark, etc designed for petabytes+ two types of analytics: this is the data lake, analyze data of the entire history of the company https://medium.com/@ylashin/big-data-using-hdinsight-a-journey-in-the-zoo-ecosystem-c78b913a5ed9
  19. you already know how to write SQL Designed to work with richly structured data minimal or no ETL required This is the cover of a book, and notice the author
  20. As Don Chamberlain says, JSON kinda looks like tables "if you squint hard enough"
  21. SQL++ was a research project from UCSD in 2015 - https://arxiv.org/abs/1405.3631 - Couchbase's N1QL (operational) is the first implementation of this research paper
  22. The language itself The underlying data is different, it's not tables and rows It's collections of JSON documents
  23. SQL is made for flat relational data SQL++ takes it a step further to deal with structured data, and therefore it has some superpowers
  24. In JSON you can have nested objects Objects within objects, like address here How do I select that, project that, etc The answer is: dotted syntax
  25. Addressing arrays with square brackets
  26. We may want to flatten that array in order to filter on the values Consider "favoriteFoods" in relation would be a separate table In JSON, it's not, but we might want to do an "intra-document" join Unnest will flatten out the array and basically join each array value to its parent document
  27. Quantification means that I want to perform some filtering of an array To see if any or all items in an array satisfy some criteria For instance, I want to find all users who have "pizza" as a favorite food
  28. - Analytics - Workload isolation - "Shadow copy" created with two commands It technically IS an ETL, but it is real time, and it's created with two simple commands And it's otherwise completely automated I'll show you a demo of this later Workload isolation, read only
  29. - "big data management system" data ingestion (ETL), variety of built in adapters (local filesystem, HDFS, socket, twitter, RSS) and it's extensible Couchbase is essentially using a customized version of AsterixDB under the hood
  30. - No ETL required - Seems to access data directly, which could be a workload isolation problem (operational vs analytics) "in-place analytics" Can connect to a wide variety of databases
  31. They say you only remember 3 things from any presentation, so here they are
  32. Codd research paper - http://db.dobo.sk/wp-content/uploads/2015/11/Codd_1970_A_relational_model.pdf (may not be a good link in the long run, but it's free) - The Free Lunch is Over - http://www.gotw.ca/publications/concurrency-ddj.htm - SEQUEL paper - https://dl.acm.org/citation.cfm?id=811515 (I couldn't find a free copy)
  33. -http://forward.ucsd.edu/sqlpp.html (SQL++ part of the FORWARD project) - https://arxiv.org/abs/1405.3631 (paper published at Cornell)
  34. - book - https://www.amazon.com/SQL-Users-Tutorial-Don-Chamberlin/dp/0692184503/ - free pdf: https://resources.couchbase.com/sql_tutorial - videos - https://www.youtube.com/watch?v=KGKiSyJa0-k - https://www.youtube.com/watch?v=LAlDe1w7wxc
  35. If anything looks interesting to you, you have questions or feedback, come talk to me afterwards I want to hear from you! My boss says I have to listen to you, it's my job. So now's your chance :)