Hybrid Databases
1
Dave Stokes
MySQL Community Manager
Oracle Corporation
David.Stokes @Oracle.com
@Stoker
Slideshare.net/davidmstokes
https://elephantdolphin.blogspot.com/
https://github.com/davidmstokes/PHP-X-DevAPI
A (hopefully) Quick History
● 1st
computers had only memory
● Later came punch paper tape, cards, reel tapes
● Disks – Expensive, slow, physically large
2
Non Sequential Reads
● Did not need to read
the entire file – Yea!
● Need someway to
judge offset from
start/end of file
● Force all records to have a
uniform length and ...
3
How to get to desired record(s)?!
● How about a key on each record?
● And an index to map to those data
4
ISAM:Index Sequential Access Method!
In an ISAM system, data is organized into records which are
composed of fixed length fields. Records are stored sequentially,
originally to speed access on a tape system.
A secondary set of hash tables known as indexes contain
"pointers" into the tables, allowing individual records to be
retrieved without having to search the entire data set.
-- https://en.wikipedia.org/wiki/ISAM
5
ISAM: Yea & Nay
● Do not have to read entire file to find desired record(s), so
much better than before
● Limited
− Size
− Single key
− Relations
● Mess
● Depend on application
− On a memory limited machine (not FUN!)
6
Edgar Codd
Edgar Frank "Ted" Codd
(19 August 1923 – 18 April 2003)
was an English computer scientist
who, while working for IBM, invented
the relational model for database
management, the theoretical basis
for relational databases and
relational database management
systems
-- https://en.wikipedia.org/wiki/Edgar_F._Codd
7
Structured Query Language (SQL)
● Goals
− Speed
● Indexes
● Normalized data
− Minimal duplication of data (disks still EXPENSIVE)
− Relations between sets of data
8
Movement Away From ISAM
● Seeking
− Better performance
− Scaling
− More complex relations between data
− TRANSACTIONS
● ACID Compliance
− New ideas, new vendors
9
Normalized Data
Definition: the process of structuring a relational database in
accordance with a series of so-called normal forms in order to
reduce data redundancy and improve data integrity.
Normalization entails organizing the columns (attributes) and
tables (relations) of a database to ensure that their
dependencies are properly enforced by database integrity
constraints.
10
Cost Model
● The SQL data server needs a way to find the cheapest
way to return requested data, AKA Query Plan
− Based on past statistics from similar operations
− Think GPS
11
Back to Data Normalization
● Compartmentalize data into logical groupings
● Most ‘experts’ recommend 3NF or better
− Definition
● Complex Relation
− Tables had to be set up
● Along with keys, indexes, etcetera
● Usually needed ‘DBA*’ to set things up before data could be stored
12
NoSQL
● NoSQL databases comprise a whole series of different types of
databases like graph databases, document stores, and many
others. This talk mainly focuses on JSON Docmuent Stores
● Chicken Little – End of SQL, End of Relational Databases
● Mostly lacked ACID, rigor to data, logical operators (at first)
● Did not need ‘DBA’ setup, emphasis on CRUD, so faster to code.
So faster to start coding
● Mutable
− Change on the fly as needed 13
Vendor Actions
● NoSQL
− Add indexes, more logical
operators
− Add ACID (not as easy as it
seems to do right)
● SQL
− Add JSON data type and
supporting operators
● Oracle, Microsoft, MySQL,
PostgreSQL*
− Start looking for ways to
leverage this feature
14
Thus Ends The History Lesson
15
MySQL’s JSON Data Type
● Introduced in version 5.7.8, improved in 8.0
− 1 GB payload per JSON type column
− 24+ supporting functions
● Probably the reason for more upgrades from 5.1/.5/.6 to
5.7 and/or 8.0 than any other features
● Heart of the new X DevAPI (MySQL without the SQL)
16
So let's look at a SQL Relation
● Dave’s Custom Guitars
− Schemas
● Accounts Payable
● Accounts Receivable
● Employee
● Customers
● Parts suppliers
● Products
17
Product
18
Time to get to work
19
Product Data
● Serial number
− Date made plus some incrementing number
● Model
● Acoustic
● Electric
● Body style
− Materials
− Parts
● Tuners, frets, glue, screws, wiring, switches, pickups, tuners, bridge,
etc. 20
Sample record
Serial Number (primary key) Yyyy-mm-dd – incrementing integer
Model Reference to another table
Materials Reference to other tables
Pickup #1 (optional) Reference to another table
Pickup #2 (optional) Reference to another table
Pickup #3 (optional)
Hardware Gold, chrome, nickel, black, anodized
Builder Link to employee table
21
Tell me all about my new guitar!!!
● To be able to tell a customer all about their guitar, your
would find the ‘master’ record by the serial number
Then it gets fun!
22
Body
● May be one piece or made of multiple pieces, multiple
materials
● May have to prove ‘source’ of the materials to government
(so you need a way to determine when/where/how material
acquired & provenance)
● Different cuts, burling, finishes, bindings, inlayes
23
Pickups - Zero or more
● Made in house?
● From a vendor?
● Type / Vendor model
● Colors
● Wiring options - taps, splits, coils, rails, magnet type,
number of winds, active/passive, etcetera
24
Gets Messy Quickly
guitar
body
neck
fretboard
frets
multineck
tuners
All This and
We haven’t picked out
colours!!
25
Or inlays, or fret markers
26
Finite Element Analysis
27
Many to Many Joins
28
Each Dive into the indexes and data has a cost
So let’s reduce DB dives
GUITAR
Serial Number - DOM + INTEGER
Model Number
Employee - employee number
Shipped date
Customer Number
Type - electric | acoustic
Images
Details (JSON)
Determine the most
valuable data about an
item that are
request/required most
frequently …..
… toss the rest in a
JSON column
29
JSON Data
{ “neck” :
{ “material” :“maple”,
“fretboard” : “ebony”,
“tuners” : “Grover locking chrome”,
“frets” : [ “stainless”, “high” ],
“finish” : “urethane”},
“body” :
{ “material” : “one piece swamp ash”,
“pattern” : “tele - H H”,
“finish” : “Dupont red #4”
“binding” : “none”}
}
Because the JSON data
is mutable, we can later
add details as needed
without have to
restructure underlying
database structure.
30
Later normalize as needed
GUITAR
Serial Number - DOM + INTEGER
Model Number
Employee - employee number
Production Data (new table)
Customer Number
Type - electric | acoustic
Images
Details (JSON)
Production data
Date started
Location
Rough finish
Paint start
Paint finished
Final assembly
Shipped
31
MySQL Document Store
32
● Use schemaless JSON documents so you do not have to
normalize data and code before you know the complete schema
● Not have to embed SQL strings in your code
● Use a modern programming style API
● Be able to use the JSON data from SQL or NoSQL -
○ Best of both worlds
MySQL Document Store
33
34
35
36
db.dave_guitar.createIndex("guitar_type",
{fields:[{"field": "$.type",
"type":"TEXT(20)",
required:true}]
});
Index: Document Store or SQL
ALTER TABLE dave_guitar ADD type_index
VARCHAR(30) AS
(JSON_UNQUOTE(doc->"$.type"));
OR
37
db.dave_guitar.find("type = 'Les Paul'").fields("type")
[
{
"type": "Les Paul"
}
]
Find desired record
38
db.dave_guitar.find().fields("[type]").sort("[type]")
[
{
"[type]": [
"semi-hollow"
]
},
{
"[type]": [
"Telecaster"
]
},
{
"[type]": [
"Les Paul"
]
}
]
Refine find()
39
find()
40
No More Embedded SQL in PHP
$SQLQuery = “SELECT * FROM people WHERE job LIKE
“ . $job . “ AND age > $age”;
Versus
$collection = $schema->getCollection("people");
$result = $collection
->find('job like :job and age > :age')
->bind(['job' => 'Butler', 'age' => 16])
->execute();
41
Also Works With Tables
#!/bin/php
<?php
$session =
mysql_xdevapigetSession("mysqlx://root:hidave@localhost:33060");
if ($session === NULL) {
die("Connection could not be established");
}
$schema = $session->getSchema("world");
$table = $schema->getTable("city");
$row = $table->select('Name','District')
->where('District like :district')
->bind(['district' => 'Texas'])
->limit(25)
->execute()->fetchAll();
42
Complex Analytics
WITH cte1 AS
(SELECT doc->>"$.name" AS name,
doc->>"$.cuisine" AS cuisine,
(SELECT AVG(score) FROM
JSON_TABLE(doc, "$.grades[*]" COLUMNS
(score INT PATH "$.score")) AS r) AS
avg_score
FROM restaurants)
SELECT *, RANK() OVER
(PARTITION BY cuisine ORDER BY avg_score
DESC) AS `rank`
FROM cte1
ORDER BY `rank`, avg_score DESC LIMIT 10;
JSON_TABLE turns unstructured
JSON documents in to temporary
relational tables that can be
processed with SQL
Windowing Function for
analytics
Common Table Expression
make it easy to write sub-queries
Works with SQL also!
$session->sql("CREATE DATABASE addressbook")
->execute();
43
New Shell with three modes
Built In JavaScript and
Python interpreters let you
work with you data in the
MySQL Shell.
Plus you get command
completion, great help
facilities, the ability to
check for server upgrades,
and the ability to
administrate a InnoDB
Clusters.
And you can also use SQL 44
New Shell Bulk JSON Loader
45
So, Hybrid Databases ...
46
?
Is This a Panacea?
Nope, it is not.
For best performance out of a database you probably want a third
normalized form (or better) with a fully detailed relational schema.
With carefully developed queries utilizing good indexes.
That is in a world where nothing changes after it goes into
production.
47
No, But it is an option
Do not know data structures at beginning
Data needs to be mutable
Small needs for data
(do not need that last millisecond)
Flux is the normal state of business
48
New in MySQL 8.0
1. True Data Dictionary
2. Default UTF8MB4
3. Windowing Functions, CTEs, Lateral Derived Joins
4. InnoDB SKIPPED LOCK and NOWAIT
5. Instant Add Column
6. Histograms
7. Resource Groups
8. Better optimizer with new temporary table engine
9. True Descending Indexes
10. 3D GIS
11. JSON Enhancements
49
Please Buy My Book
If you use the JSON Data Type
from MySQL 5.7 or 8.0 and
want an easy to read, concise
guide then you need this book!
50
Q&A + Resources
●More Info on MySQL Document Store
○PHP PECL Extension for X DevAPI
■http://php.net/manual/en/book.mysql-xdevapi.php
○MySQL Document Store
■https://dev.mysql.com/doc/refman/8.0/en/document-store.html
○X DevAPI User Guide
■https://dev.mysql.com/doc/x-devapi-userguide/en/
○Dev.MySQL.com for Downloads and Other Doces
○X DevAPI Tutorial for Sunshine PHP
■https://github.com/davidmstokes/PHP-X-DevAPI
●David.Stokes@Oracle.com
○https://elephantdolphin.blogspot.com/
○Slides at https://slideshare.net/davidmstokes
○@Stoker
51

Hybrid Databases - PHP UK Conference 22 February 2019

  • 1.
    Hybrid Databases 1 Dave Stokes MySQLCommunity Manager Oracle Corporation David.Stokes @Oracle.com @Stoker Slideshare.net/davidmstokes https://elephantdolphin.blogspot.com/ https://github.com/davidmstokes/PHP-X-DevAPI
  • 2.
    A (hopefully) QuickHistory ● 1st computers had only memory ● Later came punch paper tape, cards, reel tapes ● Disks – Expensive, slow, physically large 2
  • 3.
    Non Sequential Reads ●Did not need to read the entire file – Yea! ● Need someway to judge offset from start/end of file ● Force all records to have a uniform length and ... 3
  • 4.
    How to getto desired record(s)?! ● How about a key on each record? ● And an index to map to those data 4
  • 5.
    ISAM:Index Sequential AccessMethod! In an ISAM system, data is organized into records which are composed of fixed length fields. Records are stored sequentially, originally to speed access on a tape system. A secondary set of hash tables known as indexes contain "pointers" into the tables, allowing individual records to be retrieved without having to search the entire data set. -- https://en.wikipedia.org/wiki/ISAM 5
  • 6.
    ISAM: Yea &Nay ● Do not have to read entire file to find desired record(s), so much better than before ● Limited − Size − Single key − Relations ● Mess ● Depend on application − On a memory limited machine (not FUN!) 6
  • 7.
    Edgar Codd Edgar Frank"Ted" Codd (19 August 1923 – 18 April 2003) was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases and relational database management systems -- https://en.wikipedia.org/wiki/Edgar_F._Codd 7
  • 8.
    Structured Query Language(SQL) ● Goals − Speed ● Indexes ● Normalized data − Minimal duplication of data (disks still EXPENSIVE) − Relations between sets of data 8
  • 9.
    Movement Away FromISAM ● Seeking − Better performance − Scaling − More complex relations between data − TRANSACTIONS ● ACID Compliance − New ideas, new vendors 9
  • 10.
    Normalized Data Definition: theprocess of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. Normalization entails organizing the columns (attributes) and tables (relations) of a database to ensure that their dependencies are properly enforced by database integrity constraints. 10
  • 11.
    Cost Model ● TheSQL data server needs a way to find the cheapest way to return requested data, AKA Query Plan − Based on past statistics from similar operations − Think GPS 11
  • 12.
    Back to DataNormalization ● Compartmentalize data into logical groupings ● Most ‘experts’ recommend 3NF or better − Definition ● Complex Relation − Tables had to be set up ● Along with keys, indexes, etcetera ● Usually needed ‘DBA*’ to set things up before data could be stored 12
  • 13.
    NoSQL ● NoSQL databasescomprise a whole series of different types of databases like graph databases, document stores, and many others. This talk mainly focuses on JSON Docmuent Stores ● Chicken Little – End of SQL, End of Relational Databases ● Mostly lacked ACID, rigor to data, logical operators (at first) ● Did not need ‘DBA’ setup, emphasis on CRUD, so faster to code. So faster to start coding ● Mutable − Change on the fly as needed 13
  • 14.
    Vendor Actions ● NoSQL −Add indexes, more logical operators − Add ACID (not as easy as it seems to do right) ● SQL − Add JSON data type and supporting operators ● Oracle, Microsoft, MySQL, PostgreSQL* − Start looking for ways to leverage this feature 14
  • 15.
    Thus Ends TheHistory Lesson 15
  • 16.
    MySQL’s JSON DataType ● Introduced in version 5.7.8, improved in 8.0 − 1 GB payload per JSON type column − 24+ supporting functions ● Probably the reason for more upgrades from 5.1/.5/.6 to 5.7 and/or 8.0 than any other features ● Heart of the new X DevAPI (MySQL without the SQL) 16
  • 17.
    So let's lookat a SQL Relation ● Dave’s Custom Guitars − Schemas ● Accounts Payable ● Accounts Receivable ● Employee ● Customers ● Parts suppliers ● Products 17
  • 18.
  • 19.
    Time to getto work 19
  • 20.
    Product Data ● Serialnumber − Date made plus some incrementing number ● Model ● Acoustic ● Electric ● Body style − Materials − Parts ● Tuners, frets, glue, screws, wiring, switches, pickups, tuners, bridge, etc. 20
  • 21.
    Sample record Serial Number(primary key) Yyyy-mm-dd – incrementing integer Model Reference to another table Materials Reference to other tables Pickup #1 (optional) Reference to another table Pickup #2 (optional) Reference to another table Pickup #3 (optional) Hardware Gold, chrome, nickel, black, anodized Builder Link to employee table 21
  • 22.
    Tell me allabout my new guitar!!! ● To be able to tell a customer all about their guitar, your would find the ‘master’ record by the serial number Then it gets fun! 22
  • 23.
    Body ● May beone piece or made of multiple pieces, multiple materials ● May have to prove ‘source’ of the materials to government (so you need a way to determine when/where/how material acquired & provenance) ● Different cuts, burling, finishes, bindings, inlayes 23
  • 24.
    Pickups - Zeroor more ● Made in house? ● From a vendor? ● Type / Vendor model ● Colors ● Wiring options - taps, splits, coils, rails, magnet type, number of winds, active/passive, etcetera 24
  • 25.
  • 26.
    Or inlays, orfret markers 26
  • 27.
  • 28.
    Many to ManyJoins 28 Each Dive into the indexes and data has a cost
  • 29.
    So let’s reduceDB dives GUITAR Serial Number - DOM + INTEGER Model Number Employee - employee number Shipped date Customer Number Type - electric | acoustic Images Details (JSON) Determine the most valuable data about an item that are request/required most frequently ….. … toss the rest in a JSON column 29
  • 30.
    JSON Data { “neck”: { “material” :“maple”, “fretboard” : “ebony”, “tuners” : “Grover locking chrome”, “frets” : [ “stainless”, “high” ], “finish” : “urethane”}, “body” : { “material” : “one piece swamp ash”, “pattern” : “tele - H H”, “finish” : “Dupont red #4” “binding” : “none”} } Because the JSON data is mutable, we can later add details as needed without have to restructure underlying database structure. 30
  • 31.
    Later normalize asneeded GUITAR Serial Number - DOM + INTEGER Model Number Employee - employee number Production Data (new table) Customer Number Type - electric | acoustic Images Details (JSON) Production data Date started Location Rough finish Paint start Paint finished Final assembly Shipped 31
  • 32.
    MySQL Document Store 32 ●Use schemaless JSON documents so you do not have to normalize data and code before you know the complete schema ● Not have to embed SQL strings in your code ● Use a modern programming style API ● Be able to use the JSON data from SQL or NoSQL - ○ Best of both worlds
  • 33.
  • 34.
  • 35.
  • 36.
    36 db.dave_guitar.createIndex("guitar_type", {fields:[{"field": "$.type", "type":"TEXT(20)", required:true}] }); Index: DocumentStore or SQL ALTER TABLE dave_guitar ADD type_index VARCHAR(30) AS (JSON_UNQUOTE(doc->"$.type")); OR
  • 37.
    37 db.dave_guitar.find("type = 'LesPaul'").fields("type") [ { "type": "Les Paul" } ] Find desired record
  • 38.
  • 39.
  • 40.
    40 No More EmbeddedSQL in PHP $SQLQuery = “SELECT * FROM people WHERE job LIKE “ . $job . “ AND age > $age”; Versus $collection = $schema->getCollection("people"); $result = $collection ->find('job like :job and age > :age') ->bind(['job' => 'Butler', 'age' => 16]) ->execute();
  • 41.
    41 Also Works WithTables #!/bin/php <?php $session = mysql_xdevapigetSession("mysqlx://root:hidave@localhost:33060"); if ($session === NULL) { die("Connection could not be established"); } $schema = $session->getSchema("world"); $table = $schema->getTable("city"); $row = $table->select('Name','District') ->where('District like :district') ->bind(['district' => 'Texas']) ->limit(25) ->execute()->fetchAll();
  • 42.
    42 Complex Analytics WITH cte1AS (SELECT doc->>"$.name" AS name, doc->>"$.cuisine" AS cuisine, (SELECT AVG(score) FROM JSON_TABLE(doc, "$.grades[*]" COLUMNS (score INT PATH "$.score")) AS r) AS avg_score FROM restaurants) SELECT *, RANK() OVER (PARTITION BY cuisine ORDER BY avg_score DESC) AS `rank` FROM cte1 ORDER BY `rank`, avg_score DESC LIMIT 10; JSON_TABLE turns unstructured JSON documents in to temporary relational tables that can be processed with SQL Windowing Function for analytics Common Table Expression make it easy to write sub-queries
  • 43.
    Works with SQLalso! $session->sql("CREATE DATABASE addressbook") ->execute(); 43
  • 44.
    New Shell withthree modes Built In JavaScript and Python interpreters let you work with you data in the MySQL Shell. Plus you get command completion, great help facilities, the ability to check for server upgrades, and the ability to administrate a InnoDB Clusters. And you can also use SQL 44
  • 45.
    New Shell BulkJSON Loader 45
  • 46.
  • 47.
    Is This aPanacea? Nope, it is not. For best performance out of a database you probably want a third normalized form (or better) with a fully detailed relational schema. With carefully developed queries utilizing good indexes. That is in a world where nothing changes after it goes into production. 47
  • 48.
    No, But itis an option Do not know data structures at beginning Data needs to be mutable Small needs for data (do not need that last millisecond) Flux is the normal state of business 48
  • 49.
    New in MySQL8.0 1. True Data Dictionary 2. Default UTF8MB4 3. Windowing Functions, CTEs, Lateral Derived Joins 4. InnoDB SKIPPED LOCK and NOWAIT 5. Instant Add Column 6. Histograms 7. Resource Groups 8. Better optimizer with new temporary table engine 9. True Descending Indexes 10. 3D GIS 11. JSON Enhancements 49
  • 50.
    Please Buy MyBook If you use the JSON Data Type from MySQL 5.7 or 8.0 and want an easy to read, concise guide then you need this book! 50
  • 51.
    Q&A + Resources ●MoreInfo on MySQL Document Store ○PHP PECL Extension for X DevAPI ■http://php.net/manual/en/book.mysql-xdevapi.php ○MySQL Document Store ■https://dev.mysql.com/doc/refman/8.0/en/document-store.html ○X DevAPI User Guide ■https://dev.mysql.com/doc/x-devapi-userguide/en/ ○Dev.MySQL.com for Downloads and Other Doces ○X DevAPI Tutorial for Sunshine PHP ■https://github.com/davidmstokes/PHP-X-DevAPI ●David.Stokes@Oracle.com ○https://elephantdolphin.blogspot.com/ ○Slides at https://slideshare.net/davidmstokes ○@Stoker 51