Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

What SQL should actually be...


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

What SQL should actually be...

  1. 1. What SQL shouldactually be...Rafael Ördög, PhD.
  2. 2. Would you ever do this?function initializeAutoSave() {setTimeOut(2000, function() {$(input#save).click();initializeAutoSave();});}
  3. 3. How about doing this?function createThumbnail($sourceImage) {$size = Config::get(thumb-size);$targetImage =$this->getTargetName($sourceImage);system("convert {$sourceImage}--resize {$size}x{$size}{$thumbImage}");}
  4. 4. And yet we somehow accept this...$dbh->exec("SELECT field FROM table");
  5. 5. And if we loathe this:<script type="text/javascript"><?php foreach($buttons as $button):?>initButton(<?=$button["id"]?>,<?=$button["usefulVar"]?>);<?php endforeach; ?></script>
  6. 6. Why arent we horrified by this?function doSomeImportantAndComplexStuffInTheDatabase() {$dbh->exec("UPDTATE {$this->generateComplexJoins()}" ."SET {$this->generateFieldsAndValueSubQueries()}" ."WHERE {$this->generateWhere()}");}function generateWhere() {$result = "TRUE";foreach($this->importantStuffs as $stuff) {$result += " AND ".$stuff->subWhere();}}
  7. 7. Donald D. Chamberlinthe father of SQL
  8. 8. Origin of SQLSEQUEL: A STRUCTURED ENGLISH QUERYLANGUAGE (1974)by Donald D. Chamberlin
  9. 9. Origin of SQLSEQUEL: A STRUCTURED ENGLISH QUERYLANGUAGE (1974)by Donald D. Chamberlin"However, there is also a large class of userswho, while they are not computer specialists,would be willing to learn to interact with acomputer in a reasonably high-level, non-procedural query language."
  10. 10. SQL is User Interface● Its not an API○ So thats why we need an ORM tool...● Its not a protocol● Its not even designed for programmers● It was however derived from anotherdatabase CLI called SQUARE, that looks abit more like a protocol:○ NAME, SAL EMP DEPT, MGR (TOY, ANDERSON)VS○ SELECT NAME, SAL FROM EMP WHERE DEPT = TOY AND MGR = ANDERSON
  11. 11. The command line user interface isnot an API● Leaking logic to other languages● Dynamically generated code is hard to debug● Security issues○ Escaping is a horror scene● Large overhead○ Process launch overheads○ Parse overhead○ Command generation overhead● Fragility○ Its more prominent with a GUI, but CLIs are not much better○ Have you ever tried to maintain a moderately sizedGreasemonkey script? Its a nightmare!
  12. 12. SQL is a bad UI by todays standards,but its even worse as an API● Fails to separate concerns○ Changing a query to improve performance may involvebreaking business logic○ Requesting a little more data can have a largeperformance hit○ You could not optimize SQL queries with AOP● Leaking concepts○ We leak our entire datastructure to the DB■ That is why a good ORM should generate DDL fromsource code and not the way around○ To solve performance issues we may even leak some ofour business logic. (Aggregating data.)■ To the one thing that is hard to scale
  13. 13. Origin of SQLSEQUEL: A STRUCTURED ENGLISH QUERYLANGUAGE (1974)by Donald D. Chamberlin"SEQUEL identifies a set of simple operationson tabular structures, which can be shown tobe of equivalent power to the first orderpredicate calculus."
  14. 14. Non tabular structures● Connections between people● Ownership relations● Documents (like articles, or presentations)● Data that belongs to a video on YouTube:○ Video○ Comments○ Likes○ etc.● Or more abstractly○ hierarchies○ graphs
  15. 15. So we have non tabular dataCustomerOrder idOrder itemOrder itemOrder itemPayment details
  16. 16. And tables to store that inCustomerOrder idOrder itemOrder itemOrder itemPayment detailsOrdersCustomersItemsPayments
  17. 17. And tables to store that inCustomerOrder idOrder itemOrder itemOrder itemPayment detailsOrdersCustomersItemsPaymentsImpedancemismatch
  18. 18. Data normalizationCustomerOrder idOrder itemOrder itemOrder itemPayment detailsOrdersCustomersItemsPayments
  19. 19. So we normalize our structures● Strongly related data will be scattered all aroundthe hard drive● Performance issues● DBA requests a denormalization○ Again: changing code for performance reason in a waythat potentially breaks business logic● Denormalized data is not indexed by the SQLdatabase○ So we create index tables...● The code using the denormalized tables will bea lot harder to maintain and understand
  20. 20. SQL tries too hard, and we abuse it● SQL databases are more than just tabular datastores○ They enforce a data transfer mechanism■ Why do I need to use TCP/IP to reach a local DB?■ And I even need to authenticate!○ They are indexing services but with very limitedcapabilities.● Why do we use SQL database for○ Storing temporary data locally (maybe files or memory?)○ Storing documents (maybe document stores?)○ Message queues (its terrible for that!)○ Inter process communication (I mean... REALLY?!)
  21. 21. ● It enforces a data transfer mechanism so it is reallyslow to run tests using the database○ Even if the data is in an in-memory table○ So we dont test the DB... or only if we must...● On the other hand since its a complicated andfrequently used API, one would be tempted to write acomplete fake○ One that stores stuff in memory, and wont use TCP/IP tocommunicate○ Its almost impossible to do that well, so we dont...● But SQL queries may change in case of performanceoptimization in ways that it breaks logic...It makes testing hard
  22. 22. So what would be better?● A native API instead of string commands. Ishould be able to independently specify:○ what data to collect or save○ how that should be done○ what to index○ the way these commands are sent to the db● The API should be as simple as possible● Schema less data structure○ And if you like static typing, then you can define yourschema as data structures or classes
  23. 23. In an SQL database the datastructure is leaked to the DBSQL DBLeakedstructureAPPDependence
  24. 24. That is the primary reason weintroduced the DB LayerSQL DBLeakedstructureAPPDependenceDBLayerDependence
  25. 25. A schema less database puts the schema tothe right side of the DB layer boundaryNoSQL DBDataStructureAPPDependenceDBLayer Dependence
  26. 26. And by the way we are