HTTP, JSON, JavaScript, Map&Reduce built in to MySQL - make it happen, today. See how a MySQL Server plugin be developed to built all this into MySQL. A new direct wire between MySQL and client-side JavaScript is created. MySQL speaks HTTP, replies JSON and offers server-side JavaScript. Server-side JavaScript gets access to MySQL data and does Map&Reduce of JSON documents stored in MySQL. Fast? 2-4x faster than proxing client-side JavaScript request through PHP/Apache. Reasonable results...
Search and Society: Reimagining Information Access for Radical Futures
HTTP, JSON, JavaScript, Map&Reduce built-in to MySQL
1. Ulf Wendel, Oracle
HTTP, JSON, JavaScript,
Map and Reduce
built-in to MySQL
Make it happen, today.
2. The speaker says...
MySQL is more than SQL!
What if...
... MySQL would talk HTTP and reply JSON
… MySQL had built-in server-side JavaScript for „MyApps“
… MySQL had poor man's Map&Reduce for JSON documents
We – you and I - make it happen. Today.
You are watching the proof of concept.
3. Groundbreaking eye-openers
New client protocols
New access methods, additional data models
New output formats
MySQL as a storage framework
Mycached (2009, Cybozu)
HandlerSocket (2010, Dena)
Drizzle HTTP JSON (2011, Steward Smith)
InnoDB Memcached (2012, Oracle)
NDB/MySQL Cluster Memcached (2012, Oracle)
JavaScript/HTTP Interface (today, You)
4. The speaker says...
We thought pluggable storage was cool. Different storage
backends for different purposes. We thought the dominant
relational data model is the one and SQL the appropriate
query language. We thought crash-safety, transactions and
scale-out through replication count.
You wanted maximum performance. You had CPU bound
in-memory work loads. You wanted the Key-Value model in
addition to the relational one. Key-Value is fantastic for
sharding. You wanted lightweight Memcached protocol and
lightweight JSON replies. You did not need a powerful query
language such as SQL. Luckily, you saw MySQL as a
storage framework!
5. MySQL Server deamon plugins
Like PHP extensions! But not so popular
Library, loaded into the MySQL process
Expose tables, variables, functions to the user
Can start network servers/listeners
Can access data – with and without SQL
SQL, relational model 3306
MySQL
Key/Value 11211 Memcached for InnoDB
Key/Value 11211 Memcached for Cluster
6. The speaker says...
MySQL daemon plugins can be compared with PHP
Extensions or Apache Modules. A daemon plugin can be
anything that extends the MySQL Server. The MySQL source
has a blueprint, a daemon plugin skeleton: it contains as
little as 300 lines of code. The code is easy to read.
MySQL is written in C and portable C++.
The books „MySQL 5.1 Plugin Development“ (Sergei
Golubchik, Andrew Hutchings) and „Understanding MySQL
Internals“ (Sascha Pachev) gets you started with plugin
development. Additional information, including examples, is
in the MySQL Reference Manual.
7. Performance
Memcached Key/Value access to MySQL
MySQL benefits: crash safe, replication, ...
Some, easy to understand client commands
Small footprint network protocol
Community reports 10,000+ queries/s single
threaded and 300,000+ queries/s with 100 clients
8. The speaker says...
I couldn't resist and will continue to show performance
figures to make my point. From now on, the machine used
for benchmarking is my notebook: Celeron Duo, 1.2 Ghz,
32bit virtual machine running SuSE 12.1 with 1.5GB of RAM
assigned, Win XP as a host. Small datasets ensure that all
benchmarks run out of main memory.
Don't let the benchmarks distract you. Dream of MySQL
as a storage – for many data models, many network
protocols and even more programming languages.
For example, think of client-side JavaScript developers.
9. MySQL for client-side JavaScript
The PoC creates a direct wire
Sandbox: HTTP/Websocket network protocols only
Needs proxying to access MySQL
Extra deployments for proxying: LAMP or node.js
Proxying adds latency and increases system load
Browser Apache
80 3306 MySQL
JavaScript PHP
10. The speaker says...
Client-side JavaScript runs in a sandbox. JavaScript
developers do HTTP/Websocket background requests to
fetch data from the web server.
Because MySQL does not understand HTTP/Websocket,
users need to setup and deploy a proxy for accessing
MySQL. For example, one can use PHP for proxying. PHP
interprets GET requests from JavaScript, connects to
MySQL, executes some SQL, translates the result into JSON
and returns it to JavaScript via HTTP.
Let's give JavaScript a direct wire to MySQL!
11. HTTP and JSON for MySQL
Like with PHP extensions!
Copy daemon plugin example, add your magic
Glue libraries: libevent (BSD), libevhtp (BSD)
Handle GET /?sql=<statement>, reply JSON
2000
1800
1600
1400
1200
Requests/s
1000
PHP proxy
800 Server plugin
600
400
200
0
1 4 8 16 32
Concurrency (ab2 -c <n>)
12. The speaker says...
First, we add a HTTP server to MySQL. MySQL shall
listen on port 8080, accept GET /?sql=SELECT%1,
run the SQL and reply the result as JSON to the user.
The HTTP server part is easy: we glue together existing,
proven BSD libraries.
Benchmark first to motivate you. The chart compares the
resulting MySQL Server daemon plugin with a PHP script
that accepts a GET parameter with a SQL statement,
connects to MySQL, runs the SQL and returns JSON. System
load reported by top is not shown. At a concurrency of
32, the load is 34 for PHP and 2,5 for the MySQL
Server daemon plugin...
13. Mission HTTP
Don't look at extending MySQL network modules!
Virtual I/O (vio) and Network (net) are fused
Start your own socket server in plugin init()
/* Plugin initialization method called by MySQL */
static int conn_js_plugin_init(void *p) {
...
/* See libevent documentation */
evthread_use_pthreads();
base = event_base_new();
/* Register generic callback to handle events */
evhttp_set_gencb(http, conn_js_send_document_cb, docroot);
handle = evhttp_bind_socket_with_handle(http, host, port);
event_base_dispatch(base);
}
14. The speaker says...
Don't bother about using any network or I/O related code of
the MySQL server. Everything is optimized for MySQL
Protocol.
The way to go is setting up your own socket server
when the plugin is started during MySQL startup.
Plugins have init() and deinit() methods, very much like PHP
extensions have M|RINIT and M|RSHUTDOWN hooks.
You will easily find proper code examples on using libevent
and libevhtp. I show pseudo-code derived from my working
proof of concept.
15. Done with HTTP – for now
Request handling - see libevent examples
static void
conn_js_send_document_cb(struct evhttp_request *req, void *arg) {
/* ... */
*uri = evhttp_request_get_uri(req);
decoded = evhttp_uri_parse(uri);
/* parsing is in the libevent examples */
if (sql[0]) {
query_in_thd(&json_result, sql);
evb = evbuffer_new();
evbuffer_add_printf(evb, "%s", json_result.c_ptr());
evhttp_add_header(evhttp_request_get_output_headers(req),
"Content-Type", "application/json");
evhttp_send_reply(req, 200, "OK", evb);
}
}
16. The speaker says...
You are making huge steps forward doing nothing
but copying public libevent documentation examples
and adapting it!
The hardest part is to come: learning how to run a
SQL statement and how to convert the result into
JSON.
query_in_thd() is about SQL execution. For JSON conversion
we will need to create a new Protocol class.
17. Before (left) and after (right)
Browser Browser
JavaScript JavaScript
HTTP, JSON HTTP
Apache
PHP
MySQL Protocol, binary
MySQL MySQL
18. The speaker says...
All over the presentation I do short breaks to reflect upon
the work. The cliff-hangers take a step back to show the
overall architecture and progress. Don't get lost in the
source code.
On the left you see todays proxing architecture at the
example of Apache/PHP as a synonym for LAMP. On the
right you see what has been created already.
19. Additional APIs would be cool
The new plugins come unexpected
How about a SQL service API for plugin developers?
How about a handler service API? developers
Plugin development would be even easier!
/* NOTE: must have to get access to THD! */
#define MYSQL_SERVER 1
/* For parsing and executing a statement */
#include "sql_class.h" // struct THD
#include "sql_parse.h" // mysql_parse()
#include "sql_acl.h" // SUPER_ACL
#include "transaction.h" // trans_commit
20. The speaker says...
The recommended books do a great job introducing you to
core MySQL components. So does the MySQL
documentation. You will quickly grasp what modules there
are. There is plenty information on writing storage engines,
creating INFORMATION_SCHEMA tables, SQL variables,
user-defined SQL functions – but executing SQL is a bit
more difficult.
The new class of server plugins needs comprehensive
service APIs for plugin developers for accessing data.
Both using SQL and using the low-level handler storage
interface.
21. #define MYSQL_SERVER 1
The story about THD (thread descriptor)...
Every client request is handled by a thread
Our daemon needs THD's and the define...
int query_in_thd() {
/* … */
my_thread_init();
thd = new THD(false);
/* From event_scheduler.cc, pre_init_event_thread(THD* thd) */
thd->client_capabilities = 0;
thd->security_ctx->master_access = 0;
thd->security_ctx->db_access = 0;
thd->security_ctx->host_or_ip = (char*) CONN_JS_HOST;
thd->security_ctx->set_user((char*) CONN_JS_USER);
my_net_init(&thd->net, NULL);
thd->net.read_timeout = slave_net_timeout;
22. The speaker says...
MySQL uses one thread for every request/client connection.
Additional system threads exist. To run a SQL statement
we must create and setup a THD object. It is THE
object passed all around during request execution.
The event scheduler source is a good place to learn
about setting up and tearing down a THD object. The
event scheduler starts SQL threads for events – just like we
start SQL threads to answer HTTP requests.
23. THD setup, setup, setup...
/* MySQLs' network abstraction- vio, virtual I/O */
my_net_init(&thd->net, NULL);
thd->net.read_timeout = slave_net_timeout;
thd->slave_thread = 0;
thd->variables.option_bits |= OPTION_AUTO_IS_NULL;
thd->client_capabilities |= CLIENT_MULTI_RESULTS;
/* MySQL THD housekeeping */
mysql_mutex_lock(&LOCK_thread_count);
thd->thread_id = thd->variables.pseudo_thread_id = thread_id++;
mysql_mutex_unlock(&LOCK_thread_count);
/* Guarantees that we will see the thread in SHOW PROCESSLIST
though its vio is NULL. */
thd->proc_info = "Initialized";
thd->set_time();
DBUG_PRINT("info", ("Thread %ld", thd->thread_id));
24. THD setup, setup, setup...
/* From lib_sql.cc */
thd->thread_stack = (char*) &thd;
thd->store_globals();
/* Start lexer and put THD to sleep */
lex_start(thd);
thd->set_command(COM_SLEEP);
thd->init_for_queries();
/* FIXME: ACL ignored, super user enforced */
sctx = thd->security_ctx;
sctx->master_access |= SUPER_ACL;
sctx->db_access |= GLOBAL_ACLS;
/* Make sure we are in autocommit mode */
thd->server_status |= SERVER_STATUS_AUTOCOMMIT;
/* Set default database */
thd->db = my_strdup(CONN_JS_DB, MYF(0));
thd->db_length = strlen(CONN_JS_DB);
25. The speaker says...
The setup is done. Following the motto „make it happen,
today“, we ignore some nasty details such as access control,
authorization or – in the following – bothering about
charsets. It can be done, that's for sure. I leave it to the
ones in the know, the MySQL server developers.
Access control? With HTTP? With our client-side
JavaScript code and all its secret passwords
embedded in the clear text HTML document
downloaded by the browser?
Hacking is fun!
26. Executing SQL
thd->set_query_id(get_query_id());
inc_thread_running();
/* From sql_parse.cc - do_command() */
thd->clear_error();
thd->get_stmt_da()->reset_diagnostics_area();
/* From sql_parse.cc - dispatch command() */
thd->server_status &= ~SERVER_STATUS_CLEAR_SET;
/* Text protocol and plain question, no prepared statement */
thd->set_command(COM_QUERY);
/* To avoid confusing VIEW detectors */
thd->lex->sql_command = SQLCOM_END;
/* From sql_parse.cc - alloc query() = COM_QUERY package parsing */
query = my_strdup(CONN_JS_QUERY, MYF(0));
thd->set_query(query, strlen(query) + 1);
/* Free here lest PS break */
thd->rewritten_query.free();
if (thd->is_error()) {
return;
}
27. Heck, where is the result?
Parser_state parser_state;
parser_state.init(thd, thd->query(), thd->query_length());
/* From sql_parse.cc */
mysql_parse(thd, thd->query(), thd->query_length(), &parser_state);
/* NOTE: multi query is not handled */
if (parser_state.m_lip.found_semicolon != NULL) { return; }
if (thd->is_error()) { return; }
thd->update_server_status();
if (thd->killed) {
thd->send_kill_message();
return;
}
/* Flush output buffers, protocol is mostly about output format */
thd->protocol->end_statement();
/* Reset THD and put to sleep */
thd->reset_query();
thd->set_command(COM_SLEEP);
28. The speaker says...
Our query has been executed. Unfortunately, the result is
gone by the wind.
MySQL has „streamed“ the results during the query
execution into the Protocol object of THD. Protocol in turn
has converted the raw results from MySQL into MySQL
(text) protocol binary packages and send them out using
vio/net modules. Net module was set to NULL by us earlier.
Results are lost.
Let's hack a JSON-Protocol class that returns a string to the
calller. The result is stored in a string buffer.
29. We are here...
Browser Browser
JavaScript JavaScript
GET /?sql=<mysql>
Apache
PHP
MySQL MySQL
30. The speaker says...
Quick recap.
MySQL now understands the GET /?sql=<mysql> request.
<mysql> is used as a statement string. The statement has
been executed.
Next: return the result as JSON.
32. The speaker says...
The proof-of-concept daemon plugin shall be
simplistic. Thus, we derive a class from the old MySQL 4.1
style text protocol, used for calls like mysql_query(),
mysqli_query() and so forth. Prepared statement use a
different Protocol class.
Method implementation is straight forward. We map every
store_<type>() call to
json_add_result_set_column(). Everything becomes
a C/C++ string (char*, ...). Returning a numberic column
type as a number of the JSON world is possible.
33. JSON Protocol method
bool Protocol_json::json_add_result_set_column(uint field_pos, const
uchar* s, uint32 s_length)
DBUG_ENTER("Protcol_json::json_add_result_set_column()");
DBUG_PRINT("info", ("field_pos %u", field_pos));
uint32 i, j;
uchar * buffer;
if (0 == field_pos) { json_begin_result_set_row();}
json_result.append(""");
/* TODO CHARSETs, KLUDGE type conversions, JSON escape incomplete! */
buffer = (uchar*)my_malloc(s_length * 2 * sizeof(uchar), MYF(0));
for (i = 0, j = 0; i < s_length; i++, j++) {
switch (s[i]) {
case '"': case '': case '/': case 'b':
case 'f': case 'n': case 'r': case 't':
buffer[j] = '';
j++;
break;
}
buffer[j] = s[i];
}
/*...*/
34. The speaker says...
It is plain vanilla C/C++ code one has to write. Please
remember, I show proof of concept code. Production code
from the MySQL Server team is of much higher quality. For
example, can you explain the reasons for memcpy() in this
code?
func(uchar *pos) {
ulong row_num;
memcpy(&row_num, pos, sizeof(row_num)); …
}
Leave the riddle for later. JSON is not complex!
35. Use of JSON Protocol
int query_in_thd(String * json_result) {
/* … */
thd= new THD(false));
/* JSON, replace protocol object of THD */
protocol_json.init(thd);
thd->protocol=&protocol_json;
DBUG_PRINT("info", ("JSON protocol object installed"));
/*... execute COM_QUERY SQL statement ...*/
/* FIXME, THD will call Protocol::end_statement,
the parent implementation. Thus, we cannot hook
end_statement() but need and extra call in Protocol_json
to fetch the result. */
protocol_json.json_get_result(json_result);
/* Calling should not be needed in our case */
thd->protocol->end_statement();
/*...*/
36. The speaker says...
Straight forward: we install a different protocol object for
THD and fetch the result after the query execution.
37. Proof: MySQL with HTTP, JSON
nixnutz@linux-rn6m:~/> curl -v http://127.0.0.1:8080/?sql=SELECT%201
* About to connect() to 127.0.0.1 port 8080 (#0)
* Trying 127.0.0.1... connected
> GET /?sql=SELECT%201 HTTP/1.1
> User-Agent: curl/7.22.0 (i686-pc-linux-gnu) libcurl/7.22.0
OpenSSL/1.0.0e zlib/1.2.5 c-ares/1.7.5 libidn/1.22 libssh2/1.2.9
> Host: 127.0.0.1:8080
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/plain
< Content-Length: 7
<
* Connection #0 to host 127.0.0.1 left intact
* Closing connection #0
[["1"]]
38. The speaker says...
Extracting metadata (types, column names) into the
Protocol_json method for sending was not called as it
proved to big as a task for PoC.
We wrap up the HTTP task with a multi-threaded
libevhtp based HTTP interface. Once again, copy and
adapt examples from the library documentation...
39. We are here...
Browser Browser
JavaScript JavaScript
GET /?sql=<mysql>
JSON reply
Apache
PHP
MySQL MySQL
40. The speaker says...
Quick recap.
MySQL understands a new GET /?sql=<mysql> command.
<mysql> is used as a statement string. The statement has
been executed. The result has been formatted as a JSON
documented. A HTTP reply has been sent.
Next: from single-threaded to multi-threaded.
42. The speaker says...
Multi-threading out-of-the box thanks to libevhtp. Libevhtp
is a BSD library that aims to replace the HTTP functionality
in libevent.
Note the thread-local storage (TLS) of the HTTP
worker threads. I have missed the opportunity of caching
THD in TLS. Doing so may further improve performance.
A webserver needs a script language. Let's add server-
side JavaScript! TLS will come handy soon.
43. We are here...
JavaScript
JavaScript
JavaScript JavaScript
JavaScript
GET /?sql=SELECT%1
32 concurrent clients
Apache
PHP
PHP
MySQL MySQL
400 Req/s, Load 34 1606 Req/s, Load 2,5
44. The speaker says...
MySQL understands a new GET /?sql=<mysql> command.
<mysql> is used as a statement string. The statement has
been executed. The result has been formatted as a JSON
documented. A HTTP reply has been sent. The MySQL HTTP
Interface is multi-threaded. So is MySQL – ever since.
The need for proxing (see the left) is gone. No extra
deployment of a proxying solution any more. MySQL gets
more system resources resulting in a performance boost.
45. JavaScript for MySQL
Like with PHP extensions!
Copy daemon plugin example, add your magic
Glue libraries: Google V8 JavaScript engine (BSD)
Handle GET /?app=<name>
2500
2000
1500
Requests/s
PHP
1000 Server plugin
500
0
1 4 8 16 32
Concurrency (ab2 -c <n>)
46. The speaker says...
The chart shows „Hello world“ with Apache/PHP compared
to MySQL/server-side JavaScript. I am caching the
JavaScript source code once its loaded from a database
table. JavaScript is compiled and run upon each request.
System load reported by top during ab2 -n 50000 -c32 is 27
during the PHP test and 5 for MySQL/server-side
JavaScript...
mysql> select * from js_applications where name='internetsuperhero'G
*************************** 1. row ***************************
name: internetsuperhero
source: function main() { return "Hello world"; } main();
47. Embedding Google V8
Cache expensive operations
Keep v8::Context in thread local storage
Cache the script code after fetching from MySQL
#include <v8.h>
using namespace v8;
int main(int argc, char* argv[]) {
HandleScope handle_scope;
Persistent<Context> context = Context::New();
Context::Scope context_scope(context);
Handle<String> source = String::New("'Hello' + ', World!'");
Handle<Script> script = Script::Compile(source);
Handle<Value> result = script->Run();
context.Dispose();
String::AsciiValue ascii(result);
printf("%sn", *ascii);
return 0;
}
48. The speaker says...
Google V8 is the JavaScript engine used in Google Chrome.
Google's open source browser. It is written in C++ and said
to be a fast engine. It is used by node.js and some NoSQL
databases.
Armed with the previously developed function
query_in_thd() to fetch the source of an „MySQLApp“ stored
in a table into a string, it's easy going. Learn the basic
concepts of V8 and make it happen. Once done with the V8
documentation, study http://www.codeproject.com/Articles/
29109/Using-V8-Google-s-Chrome-JavaScript-Virtual-Machin
50. The speaker says...
Final code would store the source in a system table. The
table would be accessed through the handler interface.
NoSQL would be used, so to say. Many integrity checks
would be done.
However, you haven't learned yet how to use the
Handler interface. Thus, we use what we have:
query_in_thd(). It is amazing how far we can get with
only one function.
52. The speaker says...
To boost the performance we cache v8::Context in
the thread-local storage of our HTTP worker threads.
The v8::Context is needed for compiling and running scripts.
A v8::Context contains all built-in utility functions and
objects.
For fast multi-threaded V8, each HTTP worker gets its own
v8::Isolate object. We want more than one global
v8::Isolate to boost concurrency. Isolate? Think of it as a
Mutex. Additionally, we cache the script source code in the
TLS.
Teach your HTTP server to call the functions. Done.
53. Proof: Server-side JavaScript
~/> curl -v http://127.0.0.1:8080/?app=internetsuperhero
* About to connect() to 127.0.0.1 port 8080 (#0)
* Trying 127.0.0.1... connected
> GET /?app=internetsuperhero HTTP/1.1
> User-Agent: curl/7.22.0 (i686-pc-linux-gnu) libcurl/7.22.0
OpenSSL/1.0.0e zlib/1.2.5 c-ares/1.7.5 libidn/1.22 libssh2/1.2.9
> Host: 127.0.0.1:8080
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/plain
< Content-Length: 11
<
* Connection #0 to host 127.0.0.1 left intact
* Closing connection #0
Hello world
55. We are here...
GET hello.php vs. GET /app=hello
32 concurrent clients
Apache MySQL
PHP
PHP JavaScript
JavaScript
1107 Req/s, Load 27 2360 Req/s, Load 5
56. The speaker says...
Quick recap.
MySQL has a built-in mult-threaded web server. Users can
use JavaScript for server-side scripting. „Hello world“ runs
faster than Apache/PHP. The CPU load is lower.
This is an intermediate step. This is not a new general
purpose web server.
Next: Server-side JavaScript gets SQL access.
57. Server-side JS does SELECT 1
Like with PHP extensions!
Copy daemon plugin example, add your magic
Glue libraries: Google V8 JavaScript engine (BSD)
Handle GET /?app=<name>
1600
1400
1200
1000
Requests/s
800
PHP
Server plugin
600
400
200
0
1 4 8 16 32
Concurrency (ab2 -c <n>)
58. The speaker says...
The charts and the system load is what we all expect. The
MySQL Server deamon plugin proof of concept remains in
the top position. It is faster and uses less CPU.
Here's the server-side JavaScript I have benchmarked. The
PHP counterpart is using mysqli() to execute „SELECT 1“
and converts the result into JSON.
mysql> select * from js_applications where name='select'G
*************************** 1. row ***************************
name: select
source: function main() { return ulf("SELECT 1"); } main();
1 row in set (0,00 sec)
60. The speaker says...
Sometimes programming means to glue pieces together.
This time, query_in_thd() is connected with V8.
Imagine, server-side JavaScript had access to more
functions to fetch data. That would be fantastic for map &
reduce – assuming you want it.
61. Proof: JS runs ulf('SELECT 1')
> curl -v http://127.0.0.1:8080/?app=select
* About to connect() to 127.0.0.1 port 8080 (#0)
* Trying 127.0.0.1... connected
> GET /?app=select HTTP/1.1
> User-Agent: curl/7.22.0 (i686-pc-linux-gnu) libcurl/7.22.0
OpenSSL/1.0.0e zlib/1.2.5 c-ares/1.7.5 libidn/1.22 libssh2/1.2.9
> Host: 127.0.0.1:8080
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/plain
< Content-Length: 7
<
* Connection #0 to host 127.0.0.1 left intact
* Closing connection #0
[["1"]]
62. The speaker says...
SELECT 1 is great to show the base performance of a
technology. SQL runtime is as short as possible. SQL
runtime is constant. SQL runtime contributes little to overall
runtime. With long running SQL, the majority of the time is
spend on SQL. The differences between proxying through
Apache/PHP and server-side JavaScript dimish.
But SELECT 1 is still boring. What if we put a BLOB into
MySQL, store JSON documents in it and filter them at
runtime using server-side JavaScript?
63. We are here...
JavaScript
JavaScript
JavaScript JavaScript
JavaScript
GET /?app=select1
32 concurrent clients
Apache
PHP
PHP
MySQL
MySQL JavaScript
448 Req/s, Load 34 1312 Req/s, Load 5,5
64. The speaker says...
This is a second intermediate step on the way to the main
question that has driven the author: how could MySQL be
more than SQL. MySQL is not only SQL.
Next: poor-man's document storage.
65. JS to filter JSON documents
Like with PHP extensions!
Copy daemon plugin example, add magic
Glue libraries: Google V8 JavaScript engine (BSD)
Handle GET /?app=<name>
800
700
600
500
Requests/s
400
PHP
Server plugin
300
200
100
0
1 4 8 16 32
Concurrency (ab2 -c <n>)
66. The speaker says...
MySQL with server-side JavaScript is still faster than PHP but
at a 32 concurrent clients the system load reported
by top is 9. Have we reached a dead end?
Or, should I buy myself a new notebook? The subnotebook
that runs all benchmarks in a VM is four years old. Please,
take my absolute performance figures with a grain of salt.
Benchmarking on modern commodity server hardware was
no goal.
67. We are here...
JavaScript
JavaScript
JavaScript JavaScript
JavaScript
GET /?map=greetings
32 concurrent clients
Apache
PHP
PHP
MySQL
JSON documents
MySQL stored in BLOB JavaScript
358 Req/s, Load 33,5 641 Req/s, Load 9
68. The speaker says...
The illustration shows vision together with first benchmark
impressions.
This is how far I got linking three BSD software libraries to
the MySQL Server using a MySQL Server daemon plugin.
Allow me a personal note, this was the status a week before
the presentation.
69. Mission JSON document mapping
Use the full potential of MySQL as storage
Stream results into the map() function?
Handler interface instead of SQL
Cache the result – create a view
70. The speaker says...
Attention - you are leaving the relational model and
entering the NoSQL section of this talk.
We are no longer talking about relations. We talk about
JSON documents. SQL as an access language can't be used
anymore. We must map & reduce documents to filter out
information, cache results and use triggers to maintain
integrity between derived, cached documents and originals.
If you want, you can have access to the API used inside
MySQL to execute SQL – the handler interface.
71. Maybe the loops are slow?
Too many iterations to filter the doument?
First loop inside the plugin to fetch rows
Storing all rows in a string eats memory
Second loop inside the server-side JavaScript
function filter_names() {
var s = ulf("SELECT document FROM test_documents");
var docs = JSON.parse(s);
var res = [];
for (i = 0; i < docs.length; i++) {
var doc = JSON.parse(docs[i]);
if (doc.firstname !== undefined) {
res[i] = "Hi " + doc.firstname;
}
}
return JSON.stringify(res);
}
72. The speaker says...
The map() function API is not beautiful. First, we iterate
over all rows in our plugin and create a string. Then, we
pass the string to JavaScript and to the same loop again.
73. Maybe this is faster?
Use handler interface
Open table
For each row: populate C++ object with document
For each row: run map() function and access object
function map(){
var res;
var row = JSON.parse(doc.before);
if (row.firstname !== undefined)
res = "Hi " + row.firstname;
doc.after = JSON.stringify(res);
}
map();
74. The speaker says...
The user API, the JavaScript function is still not nice but a
step forward. A good one?
76. The speaker says...
For demonstrating the handler interface I show a function
that copies all rows from one table to another. It is assumed
that the tables have identical structures. The loop has
most of what is needed to create a „view“ or read
from a „view“.
Before you can use the handler interface you must create a
THD object. Use the setup and tear down code from
query_in_thd(). Once done, create a table list to be passed
to open_and_lock() tabled, tell the handler that we will
access all rows and announce our plan to start reading
calling ha_rnd_init().
77. Using the handler interface
do {
if ((err = table_from->file->ha_rnd_next(table_to->record[0]))) {
switch (err) {
case HA_ERR_RECORD_DELETED:
case HA_ERR_END_OF_FILE:
goto close;
break;
default:
table_from->file->print_error(err, MYF(0));
goto close;
}
} else {
table_to->file->ha_write_row(table_to->record[0]);
}
} while (1);
close:
/* from sql_base.cc - open_and_lock_tables failure */
table_from->file->ha_rnd_end();
if (! thd->in_sub_stmt) {
trans_commit_stmt(thd);
}
close_thread_tables(thd);
78. The speaker says...
Read rows from one table into a buffer and write the buffer
into the target table. Stop in case of an error or when all
rows have been read. Such loops can be found all over in
the MySQL Server code.
When done, close the table handles and tear down THD
before exiting.
79. Extracting data for map()
my_bitmap_map * old_map;
my_ptrdiff_t offset;
Field * field;
::String tmp, *val;
/*...*/
Do { /* the handler loop */
old_map = dbug_tmp_use_all_columns(table_from, table_from->read_set);
offset = (my_ptrdiff_t)0;
for (i = 0; i < table_from->s->fields; i++) {
field = table_from->field[i];
field->move_field_offset(offset);
if (!field->is_null()) {
/* document is the C++/JavaScript data exchange object */
document->before = field->val_str(&tmp, &tmp);
/* run map() function */
result = v8::script->Run();
/* store modified value*/
field->store(document->after.c_ptr(), document->after.length(),
system_charset_info);
field->move_field_offset(-offset);
}
dbug_tmp_restore_column_map(table_from->read_set, old_map);
/* ... */ } while (1);
80. The speaker says...
This code goes into the handler loop instead of the simple
copy done with table_to->file-
>ha_write_row(table_to->record[0]);
For reference it is shown how to loop over all columns of a
row and extract the data. In case of the document mapping
one needs to read only the data for the BLOB column and
call the JavaScript map() function.
A C++ object is used for data exchange with JavaScript.
The object is populated before the map() function is run and
inspected afterward.
81. Surprise: no major difference
Are C++/ V8-JS context switches expensive?
Calling JS for every row is a bad idea?
Using C++ object for data exchange does not fly?
We should send rows in batches to reduce switches
800
700
600
500
Requests/s
400 Server plugin (SQL)
Server plugin (Handler inter-
300 face)
200
100
0
1 4 8 16 32
Concurrency (ab2 -c <n>)
82. The speaker says...
Calling the map function for every row reduces the
performance a bit. Let's recap how good performance is. It
is twice as fast as the PHP/Apache proxying approach.
Detailed bottleneck analysis and further benchmarking is
beyond the scope and interest of this proof of concept. It
has been proven that mapping is possible – at very
reasonable performance.
83. Single threaded read
8,300 documents mapped per second with V8
8,700 docs/s if map() is an empty function
11,500 docs/s if not calling map()
12,800 docs/s is the base without v8 during read
14000
12000
Documents processed per second
10000
8000
No V8 in loop
V8 but no script run
6000 V8 with empty map function
V8 with filtering map function
4000
2000
0
1
Concurrency (ab2 -c <n>)
84. The speaker says...
There is a simple solution how we get to the base
line of 12,800 documents read per second. We cache
the result in a „view“.
The view is a SQL table that the plugin creates, if the view is
accessed for the first time. Triggers could be used to update
a view whenever underlying data changes.
Please note, the figure of 12,800 is extrapolated from ab2 -n
1000 -c 1 127.0.0.1:8080/?map=<name> to repeatedly
scan a small table with 522 rows (documents) using the
handler interface.
85. Map and reduce with MySQL
JavaScript
JavaScript JavaScript
JavaScript
GET /map=greeting
32 concurrent clients
MySQL JSON documents MySQL
SQL Handler
JavaScript JavaScript
641 Req/s, Load 9 571 Req/s, Load 9
86. The speaker says...
As the name says, Map&Reduce is a two stage process.
Mapping is optionally followed by reducing. If you are new
to map and reduce, think of reduce as the aggregation step
in SELECT <column> FROM <table> GROUP BY <criteria>.
It has been shown that map() is possible. Results can be
persisted in a table. Reducing can be understood as second
mapping that works on the results of the map() function.
Mapping has been proven to be possible thus
reducing is. Implementation was beyond the authors
goals.
87. Areas for future work
Imagine someone created a BLOB optimized storage engine
for MySQL. Storage engine development is covered in the
books...
Imagine Websocket would be used instead of HTTP.
Websocket is a raw „do as you like“ connection whith much
less overhead. GET /?sql=SELECT%201 return 71 bytes of
which 7 are the payload...
Imagine Websocket would be used: transactions, events,
streaming – all this is within reach...
88. PS: This is a proof of concept. No less, no more. I have created it in my after-work office. For the
next couple of weeks I plan to focus on nothing but my wedding. Otherwise the bride may decide
that an 19 years long evaluation is not enough. She might fear I could be coding during the
ceremony...
Would you create the MySQL HTTP Interface, today? I'm busy with the wedding.
Happy Hacking!