What SQL should
actually be...
Rafael Ördög, PhD.
Would you ever do this?
function initializeAutoSave() {
setTimeOut(2000, function() {
$('input#save').click();
initializeAutoSave();
});
}
How about doing this?
function createThumbnail($sourceImage) {
$size = Config::get('thumb-size');
$targetImage =
$this->getTargetName($sourceImage);
system("convert {$sourceImage}
--resize {$size}x{$size}
{$thumbImage}");
}
And yet we somehow accept this...
$dbh->exec("SELECT field FROM table");
And if we loathe this:
<script type="text/javascript">
<?php foreach($buttons as $button):?>
initButton(<?=$button["id"]?>,
<?=$button["usefulVar"]?>);
<?php endforeach; ?>
</script>
Why aren't we horrified by this?
function doSomeImportantAndComplexStuffInTheDatabase() {
$dbh->exec("UPDTATE {$this->generateComplexJoins()}" .
"SET {$this->generateFieldsAndValueSubQueries()}" .
"WHERE {$this->generateWhere()}");
}
function generateWhere() {
$result = "TRUE";
foreach($this->importantStuffs as $stuff) {
$result += " AND ".$stuff->subWhere();
}
}
Donald D. Chamberlin
the father of SQL
Origin of SQL
SEQUEL: A STRUCTURED ENGLISH QUERY
LANGUAGE (1974)
by Donald D. Chamberlin
Origin of SQL
SEQUEL: A STRUCTURED ENGLISH QUERY
LANGUAGE (1974)
by Donald D. Chamberlin
"However, there is also a large class of users
who, while they are not computer specialists,
would be willing to learn to interact with a
computer in a reasonably high-level, non-
procedural query language."
SQL is User Interface
● It's not an API
○ So that's why we need an ORM tool...
● It's not a protocol
● It's not even designed for programmers
● It was however derived from another
database CLI called SQUARE, that looks a
bit more like a protocol:
○ NAME, SAL EMP DEPT, MGR ('TOY', 'ANDERSON')
VS
○ SELECT NAME, SAL FROM EMP WHERE DEPT = 'TOY' AND MGR = 'ANDERSON'
The command line user interface is
not an API
● Leaking logic to other languages
● Dynamically generated code is hard to debug
● Security issues
○ Escaping is a horror scene
● Large overhead
○ Process launch overheads
○ Parse overhead
○ Command generation overhead
● Fragility
○ It's more prominent with a GUI, but CLIs are not much better
○ Have you ever tried to maintain a moderately sized
Greasemonkey script? It's a nightmare!
SQL is a bad UI by todays standards,
but it's even worse as an API
● Fails to separate concerns
○ Changing a query to improve performance may involve
breaking business logic
○ Requesting a little more data can have a large
performance hit
○ You could not optimize SQL queries with AOP
● Leaking concepts
○ We leak our entire datastructure to the DB
■ That is why a good ORM should generate DDL from
source code and not the way around
○ To solve performance issues we may even leak some of
our business logic. (Aggregating data.)
■ To the one thing that is hard to scale
Origin of SQL
SEQUEL: A STRUCTURED ENGLISH QUERY
LANGUAGE (1974)
by Donald D. Chamberlin
"SEQUEL identifies a set of simple operations
on tabular structures, which can be shown to
be of equivalent power to the first order
predicate calculus."
Non tabular structures
● Connections between people
● Ownership relations
● Documents (like articles, or presentations)
● Data that belongs to a video on YouTube:
○ Video
○ Comments
○ Likes
○ etc.
● Or more abstractly
○ hierarchies
○ graphs
So we have non tabular data
Customer
Order id
Order item
Order item
Order item
Payment details
And tables to store that in
Customer
Order id
Order item
Order item
Order item
Payment details
OrdersCustomersItemsPayments
And tables to store that in
Customer
Order id
Order item
Order item
Order item
Payment details
OrdersCustomersItemsPayments
Impedancemismatch
Data normalization
Customer
Order id
Order item
Order item
Order item
Payment details
OrdersCustomersItemsPayments
So we normalize our structures
● Strongly related data will be scattered all around
the hard drive
● Performance issues
● DBA requests a denormalization
○ Again: changing code for performance reason in a way
that potentially breaks business logic
● Denormalized data is not indexed by the SQL
database
○ So we create index tables...
● The code using the denormalized tables will be
a lot harder to maintain and understand
SQL tries too hard, and we abuse it
● SQL databases are more than just tabular data
stores
○ They enforce a data transfer mechanism
■ Why do I need to use TCP/IP to reach a local DB?
■ And I even need to authenticate!
○ They are indexing services but with very limited
capabilities.
● Why do we use SQL database for
○ Storing temporary data locally (maybe files or memory?)
○ Storing documents (maybe document stores?)
○ Message queues (it's terrible for that!)
○ Inter process communication (I mean... REALLY?!)
● It enforces a data transfer mechanism so it is really
slow to run tests using the database
○ Even if the data is in an in-memory table
○ So we don't test the DB... or only if we must...
● On the other hand since it's a complicated and
frequently used API, one would be tempted to write a
complete fake
○ One that stores stuff in memory, and won't use TCP/IP to
communicate
○ It's almost impossible to do that well, so we don't...
● But SQL queries may change in case of performance
optimization in ways that it breaks logic...
It makes testing hard
So what would be better?
● A native API instead of string commands. I
should be able to independently specify:
○ what data to collect or save
○ how that should be done
○ what to index
○ the way these commands are sent to the db
● The API should be as simple as possible
● Schema less data structure
○ And if you like static typing, then you can define your
schema as data structures or classes
In an SQL database the data
structure is leaked to the DB
SQL DB
Leaked
structure
APP
Dependence
That is the primary reason we
introduced the DB Layer
SQL DB
Leaked
structure
APP
Dependence
DBLayer
Dependence
A schema less database puts the schema to
the right side of the DB layer boundary
NoSQL DB
Data
Structure
APP
Dependence
DBLayer Dependence
And by the way we are hiring:
c0de-x.com
@devillsroom

What SQL should actually be...

  • 1.
    What SQL should actuallybe... Rafael Ördög, PhD.
  • 4.
    Would you everdo this? function initializeAutoSave() { setTimeOut(2000, function() { $('input#save').click(); initializeAutoSave(); }); }
  • 5.
    How about doingthis? function createThumbnail($sourceImage) { $size = Config::get('thumb-size'); $targetImage = $this->getTargetName($sourceImage); system("convert {$sourceImage} --resize {$size}x{$size} {$thumbImage}"); }
  • 6.
    And yet wesomehow accept this... $dbh->exec("SELECT field FROM table");
  • 7.
    And if weloathe this: <script type="text/javascript"> <?php foreach($buttons as $button):?> initButton(<?=$button["id"]?>, <?=$button["usefulVar"]?>); <?php endforeach; ?> </script>
  • 8.
    Why aren't wehorrified by this? function doSomeImportantAndComplexStuffInTheDatabase() { $dbh->exec("UPDTATE {$this->generateComplexJoins()}" . "SET {$this->generateFieldsAndValueSubQueries()}" . "WHERE {$this->generateWhere()}"); } function generateWhere() { $result = "TRUE"; foreach($this->importantStuffs as $stuff) { $result += " AND ".$stuff->subWhere(); } }
  • 10.
  • 11.
    Origin of SQL SEQUEL:A STRUCTURED ENGLISH QUERY LANGUAGE (1974) by Donald D. Chamberlin
  • 12.
    Origin of SQL SEQUEL:A STRUCTURED ENGLISH QUERY LANGUAGE (1974) by Donald D. Chamberlin "However, there is also a large class of users who, while they are not computer specialists, would be willing to learn to interact with a computer in a reasonably high-level, non- procedural query language."
  • 13.
    SQL is UserInterface ● It's not an API ○ So that's why we need an ORM tool... ● It's not a protocol ● It's not even designed for programmers ● It was however derived from another database CLI called SQUARE, that looks a bit more like a protocol: ○ NAME, SAL EMP DEPT, MGR ('TOY', 'ANDERSON') VS ○ SELECT NAME, SAL FROM EMP WHERE DEPT = 'TOY' AND MGR = 'ANDERSON'
  • 14.
    The command lineuser interface is not an API ● Leaking logic to other languages ● Dynamically generated code is hard to debug ● Security issues ○ Escaping is a horror scene ● Large overhead ○ Process launch overheads ○ Parse overhead ○ Command generation overhead ● Fragility ○ It's more prominent with a GUI, but CLIs are not much better ○ Have you ever tried to maintain a moderately sized Greasemonkey script? It's a nightmare!
  • 15.
    SQL is abad UI by todays standards, but it's even worse as an API ● Fails to separate concerns ○ Changing a query to improve performance may involve breaking business logic ○ Requesting a little more data can have a large performance hit ○ You could not optimize SQL queries with AOP ● Leaking concepts ○ We leak our entire datastructure to the DB ■ That is why a good ORM should generate DDL from source code and not the way around ○ To solve performance issues we may even leak some of our business logic. (Aggregating data.) ■ To the one thing that is hard to scale
  • 16.
    Origin of SQL SEQUEL:A STRUCTURED ENGLISH QUERY LANGUAGE (1974) by Donald D. Chamberlin "SEQUEL identifies a set of simple operations on tabular structures, which can be shown to be of equivalent power to the first order predicate calculus."
  • 17.
    Non tabular structures ●Connections between people ● Ownership relations ● Documents (like articles, or presentations) ● Data that belongs to a video on YouTube: ○ Video ○ Comments ○ Likes ○ etc. ● Or more abstractly ○ hierarchies ○ graphs
  • 18.
    So we havenon tabular data Customer Order id Order item Order item Order item Payment details
  • 19.
    And tables tostore that in Customer Order id Order item Order item Order item Payment details OrdersCustomersItemsPayments
  • 20.
    And tables tostore that in Customer Order id Order item Order item Order item Payment details OrdersCustomersItemsPayments Impedancemismatch
  • 21.
    Data normalization Customer Order id Orderitem Order item Order item Payment details OrdersCustomersItemsPayments
  • 22.
    So we normalizeour structures ● Strongly related data will be scattered all around the hard drive ● Performance issues ● DBA requests a denormalization ○ Again: changing code for performance reason in a way that potentially breaks business logic ● Denormalized data is not indexed by the SQL database ○ So we create index tables... ● The code using the denormalized tables will be a lot harder to maintain and understand
  • 23.
    SQL tries toohard, and we abuse it ● SQL databases are more than just tabular data stores ○ They enforce a data transfer mechanism ■ Why do I need to use TCP/IP to reach a local DB? ■ And I even need to authenticate! ○ They are indexing services but with very limited capabilities. ● Why do we use SQL database for ○ Storing temporary data locally (maybe files or memory?) ○ Storing documents (maybe document stores?) ○ Message queues (it's terrible for that!) ○ Inter process communication (I mean... REALLY?!)
  • 24.
    ● It enforcesa data transfer mechanism so it is really slow to run tests using the database ○ Even if the data is in an in-memory table ○ So we don't test the DB... or only if we must... ● On the other hand since it's a complicated and frequently used API, one would be tempted to write a complete fake ○ One that stores stuff in memory, and won't use TCP/IP to communicate ○ It's almost impossible to do that well, so we don't... ● But SQL queries may change in case of performance optimization in ways that it breaks logic... It makes testing hard
  • 26.
    So what wouldbe better? ● A native API instead of string commands. I should be able to independently specify: ○ what data to collect or save ○ how that should be done ○ what to index ○ the way these commands are sent to the db ● The API should be as simple as possible ● Schema less data structure ○ And if you like static typing, then you can define your schema as data structures or classes
  • 27.
    In an SQLdatabase the data structure is leaked to the DB SQL DB Leaked structure APP Dependence
  • 28.
    That is theprimary reason we introduced the DB Layer SQL DB Leaked structure APP Dependence DBLayer Dependence
  • 29.
    A schema lessdatabase puts the schema to the right side of the DB layer boundary NoSQL DB Data Structure APP Dependence DBLayer Dependence
  • 30.
    And by theway we are hiring: c0de-x.com @devillsroom