NoSQL and MySQL: News about JSON

NoSQL and SQL: The Best of Both
Worlds
Mario Beck
MySQL Presales Manager EMEA
Mablomy.blogspot.de
3rd November, 2015
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
Copyright 2015, Oracle and/or its affiliates. All rights reserved 2

NoSQL
Simple access patterns
Compromise on consistency
for performance
Ad-hoc data format
Simple operation
SQL
Complex queries with joins
ACID transactions
Well defined schemas
Rich set of tools
Still a role for SQL (RDBMS)?
Scalability
Performance
HA
Ease of use
SQL/Joins
ACID Transactions
26th March 2015 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 3

MySQL Cluster Overview
• In-Memory Optimization + Disk-Data
• Predictable Low-Latency, Bounded Access Time
REAL-TIME
• Auto-Sharding, Multi-Master
• ACID Compliant, OLTP + Real-Time Analytics
HIGH SCALE, READS +
WRITES
• Shared nothing, no Single Point of Failure
• Self Healing + On-Line Operations
99.999% AVAILABILITY
• Key/Value + Complex, Relational Queries
• SQL + Memcached + JavaScript + Java + HTTP/REST & C++
SQL + NoSQL
• Open Source + Commercial Editions
• Commodity hardware + Management, Monitoring Tools
LOW TCO

MySQL Cluster Scaling
MySQL Cluster Data Nodes
Clients
Application Layer
Data Layer

NoSQL Access to MySQL Cluster data
Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps
JPA
Cluster JPA
PHP Perl Python Ruby JDBC Cluster J JS Apache Memcached
MySQL JNI Node.JS mod_ndb ndb_eng
NDB API (C++)

1.2 Billion UPDATEs per Minute
• Distributed Joins also
possible
0
5
10
15
20
25
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
MillionsofUPDATEsperSecond
Scalability a
Performance a
HA a
Ease of use a
SQL/Joins a
ACID Transactions a

• Memory optimized tables
– Durable
– Mix with disk-based tables
• Massively concurrent OLTP
• Distributed Joins for analytics
• Parallel table scans for non-indexed
searches
• MySQL Cluster 7.4 FlexAsych
– 200M NoSQL Reads/Second
26th March 2015 9
MySQL Cluster 7.4 NoSQL Performance
200 Million NoSQL Reads/Second
-
50,000,000
100,000,000
150,000,000
200,000,000
250,000,000
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
Readspersecond
Data Nodes
FlexAsync Reads

Cluster & Memcached - Configured Schema
<town:maidenhead,SL6>
prefix key value
<town:maidenhead,SL6>
key value
Prefix Table Key-col Val-col policy
town: map.zip town code cluster
Config tables
town ... code ...
maidenhead ... SL6 ...
map.zip
Application view
SQL view

Node.js NoSQL API • Native JavaScript access to MySQL Cluster
– End-to-End JavaScript: browser to the app & DB
– Storing and retrieving JavaScript objects
directly in MySQL Cluster
– Eliminate SQL transformation
• Implemented as a module for node.js
– Integrates Cluster API library within the web
app
• Couple high performance, distributed apps,
with high performance distributed database
• Optionally routes through MySQL Server
– Use with InnoDB
V8 JavaScript Engine
MySQL Cluster Node.js Module
Clients

NoSQL API for Node.js & FKs
FKs enforced on all APIs:
{ message: 'Error',
sqlstate: '23000',
ndb_error: null,
cause:
{message: 'Foreign key constraint violated: No parent row found [255]',
sqlstate: '23000',
ndb_error:
{ message: 'Foreign key constraint violated: No parent row found',
code: 255,
classification: 'ConstraintViolation',
handler_error_code: 151,
status: 'PermanentError' },
cause: null } }

SQL
• Industry standard
• Joins & Complex queries
• Relational model
Memcached
• Simple to use API
• Key/value
• Drivers for many languages
Mod-ndb
• REST
• Html
• Plugin for Apache
ClusterJ
• Simple to Use Java API
• Web & telco
• Object Relational Mapping
• Native & fast access to data
ClusterJPA
• OpenJPA plugin
• Standards defined ORM
• Cross table Joins
JavaScript/Node.js
• Native JavaScript: client to
DB
• Blazing fast asynchronous
throughput
Choosing the right application API

MySQL 5.6 Memcached with InnoDB
0
10000
20000
30000
40000
50000
60000
70000
80000
8 32 128 512
TPS
Client Connections
Memcached API
SQL
Clients and Applications
MySQL Server
Memcached Plug-in
innodb_
memcached
local cache
(optional)
Handler API InnoDB API
InnoDB Storage Engine
mysqld process
SQL Memcached Protocol
Up to 9x Higher “SET / INSERT” Throughput

Core New JSON features in MySQL 5.7
• Native JSON datatype
• JSON Functions
• Generated Columns
16

The JSON Type
17
CREATE TABLE employees (data JSON);
INSERT INTO employees VALUES ('{"id": 1, "name": "Jane"}');
INSERT INTO employees VALUES ('{"id": 2, "name": "Joe"}');
SELECT * FROM employees;
+---------------------------+
| data |
+---------------------------+
| {"id": 1, "name": "Jane"} |
| {"id": 2, "name": "Joe"} |
+---------------------------+
2 rows in set (0,00 sec)

JSON Type Tech Specs
• utf8mb4 character set
• Optimized for read intensive workload
• Parse and validation on insert only
• Dictionary
• Sorted objects' keys
• Fast access to array cells by index
18

JSON Type Tech Specs (cont.)
• Supports all native JSON types
• Numbers, strings, bool
• Objects, arrays
• Extended
• Date, time, datetime, timestamp
• Other
19

Advantages over TEXT/VARCHAR
1. Provides Document Validation:
2. Efficient Binary Format
Allows quicker access to object members and array elements
20
INSERT INTO employees VALUES ('some random text');
ERROR 3130 (22032): Invalid JSON text: "Expect a value here." at position 0 in
value (or column) 'some random text'.

JSON Functions
21
SET @document = '[10, 20, [30, 40]]';
SELECT JSON_EXTRACT(@document, '$[1]');
+---------------------------------+
| JSON_EXTRACT(@document, '$[1]') |
+---------------------------------+
| 20 |
+---------------------------------+
1 row in set (0.01 sec)

JSON Array Creation
22
SELECT JSON_ARRAY(id,
feature->"$.properties.STREET",
feature->'$.type") AS json_array
FROM features ORDER BY RAND() LIMIT 3;
+-------------------------------+
| json_array |
+-------------------------------+
| [65298, "10TH", "Feature"] |
| [122985, "08TH", "Feature"] |
| [172884, "CURTIS", "Feature"] |
+-------------------------------+
3 rows in set (2.66 sec)

JSON Object Creation
23
SELECT JSON_OBJECT('id', id,
'street', feature->"$.properties.STREET",
'type', feature->"$.type"
) AS json_object
FROM features ORDER BY RAND() LIMIT 3;
+--------------------------------------------------------+
| json_object |
+--------------------------------------------------------+
| {"id": 122976, "type": "Feature", "street": "RAUSCH"} |
| {"id": 148698, "type": "Feature", "street": "WALLACE"} |
| {"id": 45214, "type": "Feature", "street": "HAIGHT"} |
+--------------------------------------------------------+
3 rows in set (3.11 sec)

• 5.7 supports functions to CREATE, SEARCH, MODIFY and RETURN JSON
values:
JSON Functions
24
JSON_ARRAY_APPEND()
JSON_ARRAY_INSERT()
JSON_ARRAY()
JSON_CONTAINS_PATH()
JSON_CONTAINS()
JSON_DEPTH()
JSON_EXTRACT()
JSON_INSERT()
JSON_KEYS()
JSON_LENGTH()
JSON_MERGE()
JSON_OBJECT()
JSON_QUOTE()
JSON_REMOVE()
JSON_REPLACE()
JSON_SEARCH()
JSON_SET()
JSON_TYPE()
JSON_UNQUOTE()
JSON_VALID()
https://dev.mysql.com/doc/refman/5.7/en/json-functions.html

Tests Using Real Life Data
• Via SF OpenData
• 206K JSON objects
representing subdivision
parcels.
• Imported from https://github.com/zemirco/sf-city-lots-json + small tweaks
25
CREATE TABLE features (
id INT NOT NULL auto_increment primary key,
feature JSON NOT NULL
);

26
{
"type":"Feature",
"geometry":{
"type":"Polygon",
"coordinates":[
[
[-122.42200352825247,37.80848009696725,0],
[-122.42207601332528,37.808835019815085,0],
[-122.42110217434865,37.808803534992904,0],
[-122.42106256906727,37.80860105681814,0],
[-122.42200352825247,37.80848009696725,0]
]
]
},
"properties":{
"TO_ST":"0",
"BLKLOT":"0001001",
"STREET":"UNKNOWN",
"FROM_ST":"0",
"LOT_NUM":"001",
"ST_TYPE":null,
"ODD_EVEN":"E",
"BLOCK_NUM":"0001",
"MAPBLKLOT":"0001001"
}
}

Naive Performance Comparison
27
# as JSON type
SELECT DISTINCT
feature->"$.type" as json_extract
FROM features;
+--------------+
| json_extract |
+--------------+
| "Feature" |
+--------------+
Unindexed traversal of 206K documents
# as TEXT type
SELECT DISTINCT
feature->"$.type" as json_extract
FROM features;
+--------------+
| json_extract |
+--------------+
| "Feature" |
+--------------+
Explanation: Binary format of JSON type is very efficient at searching. Storing as TEXT
performs over 10x worse at traversal.
Using short cut for
JSON_EXTRACT. Coming
in 5.7.9.

Introducing Generated Columns
28
id my_integer my_integer_plus_one
1 10 11
2 20 21
3 30 31
4 40 41
CREATE TABLE t1 (
id INT NOT NULL PRIMARY KEY auto_increment,
my_integer INT,
my_integer_plus_one INT AS (my_integer+1)
);
UPDATE t1 SET my_integer_plus_one = 10 WHERE id = 1;
ERROR 3105 (HY000): The value specified for generated column 'my_integer_plus_one' in
table 't1' is not allowed.
Column automatically maintained based
on your specification.
Read-only of course

Generated Columns Support Indexes!
29
ALTER TABLE features ADD feature_type VARCHAR(30) AS (feature->"$.type");
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0
ALTER TABLE features ADD INDEX (feature_type);
SELECT DISTINCT feature_type FROM features;
+--------------+
| feature_type |
+--------------+
| "Feature" |
+--------------+
From table scan on 206K documents to index scan on 206K materialized values
Down from
1.25 sec to 0.06 sec
Creates index only. Does
not modify table rows.
Meta data change only
(FAST). Does not need to
touch table.

Generated Columns (cont.)
• Used for “functional index”
• Available as either VIRTUAL (default) or STORED:
• Both types of computed columns permit for indexes to be added.
30
ALTER TABLE features ADD feature_type varchar(30) AS (feature->"$.type") STORED;

Indexing Options Available
31
STORED VIRTUAL
Primary and Secondary
BTREE, Fulltext, GIS
Mixed with fields
Requires table rebuild
Not Online
Secondary Only
BTREE Only
Mixed with fields
No table rebuild
INSTANT Alter
Faster Insert
Bottom Line: Unless you need a PRIMARY KEY, FULLTEXT or GIS index VIRTUAL is probably better.

Virtual vs. Stored Performance
• Approximate worst case scenario via a table scan:
32
SELECT DISTINCT feature_type FROM features;
+--------------+
| feature_type |
+--------------+
| "Feature" |
+--------------+
VIRTUAL-TEXT (9.89 sec)
STORED-TEXT (0.22 sec)
VIRTUAL-JSON (0.85 sec)
STORED-JSON (0.24 sec)
Clarification: Since indexes are materialized (stored) themselves, the real-life case for STORED
is when generating the column is computationally expensive and you can not use indexes
effectively.

Road Map
• In-place partial update of JSON/BLOB (performance)
• Partial streaming of JSON/BLOB (replication)
• Full text and GIS index on virtual columns
• Currently works for "STORED"
• Improved performance through condition pushdown
33

Prefer the Relational Model - Storing as a Column
• Easier to apply a schema to your application
• Schema may make applications easier to maintain over time, as change is
controlled;
• Do not have to expect as many permutations
• Allows some constraints over data
34

Prefer the Document Model - Storing as JSON
• More flexible way to represent data that is hard to model in schema;
• Easier denormalization; an optimization that is important in some
specific situations
• No painful schema changes*
• Easier prototyping, Fewer types to consider
• No enforced schema, start storing values immediately
35
* MySQL 5.6 has Online DDL. This is not as large of an issue as it
was historically.

Prefer the Hybrid Model – Just do it!
36
SSDs have capacity_in_gb, CPUs have a core_count. These attributes are not consistent
across products.
CREATE TABLE pc_components (
id INT NOT NULL PRIMARY KEY,
description VARCHAR(60) NOT NULL,
vendor VARCHAR(30) NOT NULL,
serial_number VARCHAR(30) NOT NULL,
attributes JSON NOT NULL
);

Prefer Simple Access Pattern – Using Key-Value
• Full access to relational data
– Value can be col1|col2|col3
– Value can be json
• Much higher throughput
• Only single Row,Primary Key Access
37
0
10000
20000
30000
40000
50000
60000
70000
80000
8 32 128 512
TPS
Client Connections
Memcached API
SQL

Options for Dev – Simplicity for Ops
• Always the same tool to Backup (MySQL Enterprise Backup)
• Always the same tool to Monitor (MySQL Enterprise Monitor)
• Always the same tool to Audit (MySQL Enterprise Audit)
• Always the same tool to Protect (MySQL Enterprise Firewall)
• Always the same source of Support (Oracle MySQL Support)
• Always the same way to Deploy (Repos, Openstack, ...)
38
Polyglot Persistence with Operational Stability

Resources
• http://mysqlserverteam.com/
• http://mysqlserverteam.com/tag/json/
• https://dev.mysql.com/doc/refman/5.7/en/mysql-nutshell.html
• http://dev.mysql.com/doc/relnotes/mysql/5.7/en/
• https://dev.mysql.com/doc/refman/5.7/en/json.html
• https://dev.mysql.com/doc/refman/5.7/en/json-functions.html
• http://www.thecompletelistoffeatures.com
39

Thank You!

NoSQL and MySQL: News about JSON

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to NoSQL and MySQL: News about JSON

Similar to NoSQL and MySQL: News about JSON (20)

More from Mario Beck

More from Mario Beck (8)

Recently uploaded

Recently uploaded (20)

NoSQL and MySQL: News about JSON