What SQL should actually be...

What SQL should
actually be...
Rafael Ördög, PhD.

Would you ever do this?
function initializeAutoSave() {
setTimeOut(2000, function() {
$('input#save').click();
initializeAutoSave();
});
}

How about doing this?
function createThumbnail($sourceImage) {
$size = Config::get('thumb-size');
$targetImage =
$this->getTargetName($sourceImage);
system("convert {$sourceImage}
--resize {$size}x{$size}
{$thumbImage}");
}

And yet we somehow accept this...
$dbh->exec("SELECT field FROM table");

And if we loathe this:
<script type="text/javascript">
<?php foreach($buttons as $button):?>
initButton(<?=$button["id"]?>,
<?=$button["usefulVar"]?>);
<?php endforeach; ?>
</script>

Why aren't we horrified by this?
function doSomeImportantAndComplexStuffInTheDatabase() {
$dbh->exec("UPDTATE {$this->generateComplexJoins()}" .
"SET {$this->generateFieldsAndValueSubQueries()}" .
"WHERE {$this->generateWhere()}");
}
function generateWhere() {
$result = "TRUE";
foreach($this->importantStuffs as $stuff) {
$result += " AND ".$stuff->subWhere();
}
}

Donald D. Chamberlin
the father of SQL

Origin of SQL
SEQUEL: A STRUCTURED ENGLISH QUERY
LANGUAGE (1974)
by Donald D. Chamberlin

Origin of SQL
LANGUAGE (1974)
"However, there is also a large class of users
who, while they are not computer specialists,
would be willing to learn to interact with a
computer in a reasonably high-level, non-
procedural query language."

SQL is User Interface
● It's not an API
○ So that's why we need an ORM tool...
● It's not a protocol
● It's not even designed for programmers
● It was however derived from another
database CLI called SQUARE, that looks a
bit more like a protocol:
○ NAME, SAL EMP DEPT, MGR ('TOY', 'ANDERSON')
VS
○ SELECT NAME, SAL FROM EMP WHERE DEPT = 'TOY' AND MGR = 'ANDERSON'

The command line user interface is
not an API
● Leaking logic to other languages
● Dynamically generated code is hard to debug
● Security issues
○ Escaping is a horror scene
● Large overhead
○ Process launch overheads
○ Parse overhead
○ Command generation overhead
● Fragility
○ It's more prominent with a GUI, but CLIs are not much better
○ Have you ever tried to maintain a moderately sized
Greasemonkey script? It's a nightmare!

SQL is a bad UI by todays standards,
but it's even worse as an API
● Fails to separate concerns
○ Changing a query to improve performance may involve
breaking business logic
○ Requesting a little more data can have a large
performance hit
○ You could not optimize SQL queries with AOP
● Leaking concepts
○ We leak our entire datastructure to the DB
■ That is why a good ORM should generate DDL from
source code and not the way around
○ To solve performance issues we may even leak some of
our business logic. (Aggregating data.)
■ To the one thing that is hard to scale

Origin of SQL
LANGUAGE (1974)
"SEQUEL identifies a set of simple operations
on tabular structures, which can be shown to
be of equivalent power to the first order
predicate calculus."

Non tabular structures
● Connections between people
● Ownership relations
● Documents (like articles, or presentations)
● Data that belongs to a video on YouTube:
○ Video
○ Comments
○ Likes
○ etc.
● Or more abstractly
○ hierarchies
○ graphs

So we have non tabular data
Customer
Order id
Order item
Order item
Order item
Payment details

And tables to store that in
Customer
Order id
Order item
Order item
Order item
Payment details
OrdersCustomersItemsPayments

And tables to store that in
Customer
Order id
Order item
Order item
Order item
Payment details
Impedancemismatch

Data normalization
Customer
Order id
Order item
Order item
Order item
Payment details

So we normalize our structures
● Strongly related data will be scattered all around
the hard drive
● Performance issues
● DBA requests a denormalization
○ Again: changing code for performance reason in a way
that potentially breaks business logic
● Denormalized data is not indexed by the SQL
database
○ So we create index tables...
● The code using the denormalized tables will be
a lot harder to maintain and understand

SQL tries too hard, and we abuse it
● SQL databases are more than just tabular data
stores
○ They enforce a data transfer mechanism
■ Why do I need to use TCP/IP to reach a local DB?
■ And I even need to authenticate!
○ They are indexing services but with very limited
capabilities.
● Why do we use SQL database for
○ Storing temporary data locally (maybe files or memory?)
○ Storing documents (maybe document stores?)
○ Message queues (it's terrible for that!)
○ Inter process communication (I mean... REALLY?!)

● It enforces a data transfer mechanism so it is really
slow to run tests using the database
○ Even if the data is in an in-memory table
○ So we don't test the DB... or only if we must...
● On the other hand since it's a complicated and
frequently used API, one would be tempted to write a
complete fake
○ One that stores stuff in memory, and won't use TCP/IP to
communicate
○ It's almost impossible to do that well, so we don't...
● But SQL queries may change in case of performance
optimization in ways that it breaks logic...
It makes testing hard

So what would be better?
● A native API instead of string commands. I
should be able to independently specify:
○ what data to collect or save
○ how that should be done
○ what to index
○ the way these commands are sent to the db
● The API should be as simple as possible
● Schema less data structure
○ And if you like static typing, then you can define your
schema as data structures or classes

In an SQL database the data
structure is leaked to the DB
SQL DB
Leaked
structure
APP
Dependence

That is the primary reason we
introduced the DB Layer
SQL DB
Leaked
structure
APP
Dependence
DBLayer
Dependence

A schema less database puts the schema to
the right side of the DB layer boundary
NoSQL DB
Data
Structure
APP
Dependence
DBLayer Dependence

And by the way we are hiring:
c0de-x.com
@devillsroom

What SQL should actually be...

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (18)

Similar to What SQL should actually be...

Similar to What SQL should actually be... (20)

More from Open Academy

More from Open Academy (20)

Recently uploaded

Recently uploaded (20)

What SQL should actually be...