Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HandlerSocket plugin for MySQL (English)

15,332 views

Published on

This slide is a translation of http://www.slideshare.net/akirahiguchi/handlersocket-plugin-for-mysql-4664154

  • very good. thanks,hehe
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • This slide is a translation of http://www.slideshare.net/akirahiguchi/handlersocket-plugin-for-mysql-4664154
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

HandlerSocket plugin for MySQL (English)

  1. 1. HandlerSocket plugin forHandlerSocket plugin for MySQLMySQL Jun 29, 2010Jun 29, 2010    DeNA Technology Seminar @ YoyogiDeNA Technology Seminar @ Yoyogi IT Platform Dept., System Management DivisionIT Platform Dept., System Management Division DeNA Co.,Ltd.DeNA Co.,Ltd. Akira Higuchi <higuchi dot akira at dena dot jp>Akira Higuchi <higuchi dot akira at dena dot jp>
  2. 2. Who am I?Who am I?  Akira Higuchi, Ph.D. in scienceAkira Higuchi, Ph.D. in science  IT Platform Dept., DeNA Co.,Ltd.IT Platform Dept., DeNA Co.,Ltd.  system-wide performance optimizationsystem-wide performance optimization  middleware developmentmiddleware development  The creator of HandlerSocket pluginThe creator of HandlerSocket plugin  Using GNU/Linux since 1993Using GNU/Linux since 1993  Fedora: yum install KoboDeluxeFedora: yum install KoboDeluxe  Debian: apt-get install kobodeluxeDebian: apt-get install kobodeluxe
  3. 3. About HandlerSocket pluginAbout HandlerSocket plugin
  4. 4. What is HandlerSocket?What is HandlerSocket?  Non-SQL interface for MySQLNon-SQL interface for MySQL
  5. 5. What HandlerSocket aimsWhat HandlerSocket aims  Executes simple CRUD operations fastExecutes simple CRUD operations fast  Omit SQL parsingOmit SQL parsing  Combine multiple requests on the server sideCombine multiple requests on the server side  Allows SQL on the same databaseAllows SQL on the same database  Only simple operations can be fasterOnly simple operations can be faster  Seamless migration from SQL queriesSeamless migration from SQL queries
  6. 6. HandlerSocket pluginHandlerSocket plugin  Offers a direct and non-SQL interface to MySQLOffers a direct and non-SQL interface to MySQL storage enginesstorage engines  Own TCP/IP listenerOwn TCP/IP listener  Talks a text protocolTalks a text protocol  There is a C++ and a Perl client librariesThere is a C++ and a Perl client libraries  Only works with LinuxOnly works with Linux  The source code is here:The source code is here:  https://github.com/ahiguti/HandlerSocket-Plugin-for-MySQLhttps://github.com/ahiguti/HandlerSocket-Plugin-for-MySQL  More infos on the DeNA Tech BlogMore infos on the DeNA Tech Blog  http://engineer.dena.jp/http://engineer.dena.jp/ (in Japanese)(in Japanese)
  7. 7. ConstructionConstruction Handler Interface Innodb MyISAM Other storage engines … SQL Layer Handlersocket Plugin Listener for libmysql libmysql libhsclient Applications mysqld client app
  8. 8. Other NoSQL interfaces to MySQLOther NoSQL interfaces to MySQL  mycachedmycached  http://developer.cybozu.co.jp/kazuho/2009/08/mychttp://developer.cybozu.co.jp/kazuho/2009/08/myc  Works with any storage enginesWorks with any storage engines  Talks the memcached protocolTalks the memcached protocol  NDB APINDB API  http://dev.mysql.com/doc/ndbapi/en/index.htmlhttp://dev.mysql.com/doc/ndbapi/en/index.html  Dedicated for the ndbcluster engineDedicated for the ndbcluster engine
  9. 9. PerformancePerformance
  10. 10. PerformancePerformance 241 009 1 59407 601 91 1 5771 0 50000 1 00000 1 50000 200000 250000 300000 1 c o lu m n 50 c o lu m n s (re q u e s ts /s e c ) h a n d le rs oc ke t lib m ys q l  Handlersocket executes simple read queries 4xHandlersocket executes simple read queries 4x faster than mysqld/libmysqlfaster than mysqld/libmysql  Very effective when many columns are retrievedVery effective when many columns are retrieved  The reason is described laterThe reason is described later
  11. 11. Commands supported byCommands supported by HandlerSocket (for reading data)HandlerSocket (for reading data)  In pseudo-SQL...In pseudo-SQL... SELECT f1, .. , fn FROM db.tableSELECT f1, .. , fn FROM db.table WHERE k1, ... , km = v1, ... , vmWHERE k1, ... , km = v1, ... , vm ORDER BY index_i LIMIT offset, limitORDER BY index_i LIMIT offset, limit  (k1, ... , km) are the key fields (or a prefix) of(k1, ... , km) are the key fields (or a prefix) of the index_ithe index_i  =, >=, >, <=, and < can be used for a=, >=, >, <=, and < can be used for a comparatorcomparator
  12. 12. Commands supported byCommands supported by HandlerSocket (for modifying data)HandlerSocket (for modifying data)  UPDATE, DELETE, and INSERTUPDATE, DELETE, and INSERT  Does not support transactionsDoes not support transactions  Modifications are recorded to the binaryModifications are recorded to the binary log in the row-based formatlog in the row-based format  Modifications are durableModifications are durable
  13. 13. Command exampleCommand example  create table db1.table1 (k int key, v char(20))create table db1.table1 (k int key, v char(20))  insert into db1.table1 values (234, 'foo'), (678, ‘bar’)insert into db1.table1 values (234, 'foo'), (678, ‘bar’) $ telnet localhost 9998 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. P 0 db1 table1 PRIMARY k,v 0 1 0 = 1 234 0 2 234 foo 0 = 1 678 0 2 678 bar opens the PK find k = 234 find k = 678
  14. 14. Why fast?Why fast?  No SQL parsingNo SQL parsing low CPU usagelow CPU usage  Executes multiple requests in bulkExecutes multiple requests in bulk low CPU/Disk usagelow CPU/Disk usage  Own client/server protocolOwn client/server protocol small network transmission sizesmall network transmission size
  15. 15. Eliminating CPU consumptionEliminating CPU consumption
  16. 16. oprofile results – libmysql/mysqldoprofile results – libmysql/mysqld  Executes “SELECT v from table where k = ?” manyExecutes “SELECT v from table where k = ?” many timestimes samples| %| ------------------ 9669940 53.1574 mysqld 4438098 24.3970 vmlinux 1835976 10.0927 libpthread-2.5.so 1680656 9.2389 libc-2.5.so 397970 2.1877 e1000e 89136 0.4900 oprofiled 42881 0.2357 oprofile
  17. 17. oprofile results –oprofile results – libmysql/mysqldlibmysql/mysqld samples % symbol name 748022 7.7355 MYSQLparse(void*) 219702 2.2720 my_pthread_fastmutex_lock 205606 2.1262 make_join_statistics(JOIN*, TABLE_LIST*, 198234 2.0500 btr_search_guess_on_hash 180731 1.8690 JOIN::optimize() 177120 1.8317 row_search_for_mysql 171185 1.7703 lex_one_token(void*, void*) 162683 1.6824 alloc_root 131823 1.3632 read_view_open_now 122795 1.2699 mysql_select(THD*, Item***, TABLE_LIST*, 100276 1.0370 open_table(THD*, TABLE_LIST*, st_mem_root*, 99575 1.0297 mem_pool_fill_free_list 96434 0.9973 build_template(row_prebuilt_struct*, THD*,  CPU usage inside mysqldCPU usage inside mysqld
  18. 18. oprofile results –oprofile results – libmysql/mysqldlibmysql/mysqld samples % symbol name 204393 4.6054 schedule 118648 2.6734 tcp_sendmsg 115832 2.6099 tcp_recvmsg 106537 2.4005 tcp_v4_rcv 103915 2.3414 tcp_ack 103534 2.3328 system_call 93864 2.1150 dev_queue_xmit 86831 1.9565 __mod_timer 85891 1.9353 tcp_rcv_established 84083 1.8946 .text.task_rq_lock  CPU usage inside the Linux kernelCPU usage inside the Linux kernel
  19. 19. oprofile results –oprofile results – libmysql/mysqldlibmysql/mysqld  libmysql/mysqldlibmysql/mysqld  Much CPU time spent in mysqldMuch CPU time spent in mysqld  Parsing SQL is slowParsing SQL is slow  schedule() is called frequentlyschedule() is called frequently
  20. 20. oprofile results – HandlerSocketoprofile results – HandlerSocket samples| %| ------------------ 1919039 51.0453 vmlinux 811998 21.5987 mysqld 421215 11.2041 libpthread- 2.5.so 207166 5.5105 e1000e 191566 5.0955 handlersocket.so 188618 5.0171 libc-2.5.so 13622 0.3623 oprofiled 5707 0.1518 oprofile  CPU usage inside MySQL with HandlerSocketCPU usage inside MySQL with HandlerSocket
  21. 21. oprofile results – HandlerSocketoprofile results – HandlerSocket samples % symbol name 119684 14.7394 btr_search_guess_on_hash 58202 7.1678 row_search_for_mysql 46946 5.7815 mutex_delay 38617 4.7558 my_pthread_fastmutex_lock 37707 4.6437 buf_page_get_known_nowait 36528 4.4985 rec_get_offsets_func 34625 4.2642 build_template(row_prebuilt_struct*, THD*, TABLE*, 20024 2.4660 row_sel_store_mysql_rec 19347 2.3826 btr_cur_search_to_nth_level 16701 2.0568 row_sel_convert_mysql_key_to_innobase 13343 1.6432 cmp_dtuple_rec_with_match 11381 1.4016 ha_innobase::index_read(unsigned char*, 11176 1.3764 dict_index_copy_types 10762 1.3254 mtr_memo_slot_release 10734 1.3219 ha_innobase::init_table_handle_for_HANDLER()  CPU consumption in mysqldCPU consumption in mysqld
  22. 22. oprofile results – HandlerSocketoprofile results – HandlerSocket samples % symbol name 129038 6.7241 tcp_sendmsg 80080 4.1729 tcp_v4_rcv 69658 3.6298 dev_queue_xmit 66171 3.4481 .text.skb_release_data 63316 3.2994 __qdisc_run 60279 3.1411 tcp_recvmsg 59703 3.1111 ip_output 58462 3.0464 .text.skb_release_head_state 48876 2.5469 tcp_ack 48733 2.5394 __alloc_skb 45660 2.3793 ip_queue_xmit 44671 2.3278 tcp_transmit_skb  CPU consumption in the Linux kernelCPU consumption in the Linux kernel
  23. 23. oprofile results – HandlerSocketoprofile results – HandlerSocket  HandlerSocketHandlerSocket  Most CPU time is consumed in the kernelMost CPU time is consumed in the kernel  schedule() is not called frequentlyschedule() is not called frequently  Inside mysqld, innodb eats most CPU timeInside mysqld, innodb eats most CPU time
  24. 24. Executing multiple requests inExecuting multiple requests in bulkbulk
  25. 25. Threading modelThreading model mysqld:mysqld:  Thread per connection (MySQL 5)Thread per connection (MySQL 5)  Thread pooling (MySQL 6?)Thread pooling (MySQL 6?)
  26. 26. Threading modelThreading model HandlerSocket:HandlerSocket:  Small number of threadsSmall number of threads  Many connections per threadMany connections per thread  Uses epoll()Uses epoll()  Virtually unlimited number of concurrentVirtually unlimited number of concurrent connectionsconnections  Small memory footprintSmall memory footprint
  27. 27. HandlerSocket reader threadHandlerSocket reader thread reads requests from many clients locks the DB, gets a read view executes many requests unlocks the DB returns responses to clients locks/unlocks (1/#conns) times per request handlersocket reader thread
  28. 28. HandlerSocket writer threadHandlerSocket writer thread reads requests from many clients locks the DB, begins a transaction executes multiple requests commits, and unlocks the DB returns responses to clients handlersocket writer thread executes multiple ops in a single transaction
  29. 29. Write throughputWrite throughput  Condition:Condition:  Durable writeDurable write  sync_binlog = 1sync_binlog = 1  innodb_flush_log_at_trx_commit = 1innodb_flush_log_at_trx_commit = 1  innodb_support_xa = 1innodb_support_xa = 1  Write-back cache with BBU, or SSDWrite-back cache with BBU, or SSD  Throughput:Throughput:  MySQL: up to 1000 qpsMySQL: up to 1000 qps  HandlerSocket: up to 30000 qpsHandlerSocket: up to 30000 qps
  30. 30. How HandlerSocket locks tablesHow HandlerSocket locks tables  MyISAM:MyISAM:  Shared-exclusive lockShared-exclusive lock  InnoDB:InnoDB:  Reader threads don’t blockReader threads don’t block  Only one writer thread can be executed at the sameOnly one writer thread can be executed at the same timetime  HandlerSocket requests are deadlock-freeHandlerSocket requests are deadlock-free  Only simple operations are supportedOnly simple operations are supported
  31. 31. Client/server protocolClient/server protocol
  32. 32. MySQL C/S protocolMySQL C/S protocol write(3, "L0003select column0,column1,column2,column3,column4 fromwrite(3, "L0003select column0,column1,column2,column3,column4 from db_1.table_1 where k=15", 80) = 80db_1.table_1 where k=15", 80) = 80 read(3,read(3, "100100560023def4"100100560023def4 db_1db_177table_1table_177table_1table_177column0column077column0column0fr0<fr0< 00037520000000060033def400037520000000060033def4 db_1db_177table_1table_177table_1table_177column1column177colucolu mn1mn1fr0<00037520000000060043def4fr0<00037520000000060043def4 db_1db_177table_1table_177table_1table_177colcol umn2umn277column2column2fr0<00037520000000060053def4fr0<00037520000000060053def4 db_1db_177table_1table_177tt able_1able_177column3column377column3column3fr0<00037520000000060063def4fr0<00037520000000060063def4 db_1db_1 77table_1table_177table_1table_177column4column477column4column4fr0<0003752000000500737fr0<0003752000000500737 600"0n0010001600"0n0010001 0000100111001001220010013300100144500t37600"0", 16384) =500t37600"0", 16384) = 327327 when the above query is executed... SELECT column0, column1, column2, column3, column4 FROM db_1.table_1 where k = 15
  33. 33. HandlerSocket C/S protocolHandlerSocket C/S protocol write(3, "1t=t1t15n", 9) = 9write(3, "1t=t1t15n", 9) = 9 read(3, "0t5t0t1t2t3t4n", 8192) = 14read(3, "0t5t0t1t2t3t4n", 8192) = 14 when an equivalent query is executed using handlersocket... libmysqllibmysql handlersockethandlersocket requestrequest 80 bytes80 bytes 9 bytes9 bytes responseresponse 327 bytes327 bytes 14 bytes14 bytes
  34. 34. MySQL C/S protocolMySQL C/S protocol  The strace result shows that MySQL C/SThe strace result shows that MySQL C/S protocol is verboseprotocol is verbose  Result-set metadataResult-set metadata http://forge.mysql.com/wiki/MySQL_Internals_ClientServer_Protocol#Fhttp://forge.mysql.com/wiki/MySQL_Internals_ClientServer_Protocol#F  Result-set metadata become very large if aResult-set metadata become very large if a result-set has many columnsresult-set has many columns  Neither a HANDLER statement nor a server-sideNeither a HANDLER statement nor a server-side prepared statement does not help to avoid thisprepared statement does not help to avoid this problemproblem
  35. 35. Client librariesClient libraries
  36. 36. libhsclientlibhsclient  Client library for C++Client library for C++
  37. 37. Net::HandlerSocketNet::HandlerSocket  Client library for PerlClient library for Perl  Invokes libhsclient via XSInvokes libhsclient via XS my $cli = new Net::HandlerSocket( {host => ‘localhost’, port => 9999}); $cli->open_index(1, ‘db1’, ‘table1’, ‘PRIMARY’, ‘k,v’); my $res = $cli->exec_multi([ [ 1, ‘=‘, [ ’33’ ], 1, 0 ], [ 1, ‘=‘, [ ’44’ ], 1, 0, ‘U’, [ ’44’, ‘hoge’ ] ], [ 1, ‘>=‘, [ ’55’ ], 10, 20 ], ]);
  38. 38. Configuration hintsConfiguration hints
  39. 39. HandlerSocket configurationHandlerSocket configuration optionsoptions  handlersocket_threads = 16handlersocket_threads = 16  Number of reader threadsNumber of reader threads  Recommended value is the number of logical CPURecommended value is the number of logical CPU  handlersocket_thread_wr = 1handlersocket_thread_wr = 1  Number of writer threadsNumber of writer threads  Recommended value is ... 1Recommended value is ... 1  handlersocket_port = 9998handlersocket_port = 9998  Listening port for reader requestsListening port for reader requests  handlersocket_port_wr = 9999handlersocket_port_wr = 9999  Listening port for writer requestsListening port for writer requests
  40. 40. Other configuration optionsOther configuration options  innodb_buffer_pool_sizeinnodb_buffer_pool_size  As large as possibleAs large as possible  innodb_log_file_size, innodb_log_files_in_groupinnodb_log_file_size, innodb_log_files_in_group  As large as possibleAs large as possible  innodb_thread_concurrency = 0innodb_thread_concurrency = 0  open_files_limit = 65535open_files_limit = 65535  Number of file descriptors mysqld can openNumber of file descriptors mysqld can open  HandlerSocket can handle up to 65000 concurrentHandlerSocket can handle up to 65000 concurrent connectionsconnections
  41. 41. Other configuration optionsOther configuration options  innodb_adaptive_hash_index = 1innodb_adaptive_hash_index = 1  Adaptive has index is fast, but consumeAdaptive has index is fast, but consume memorymemory
  42. 42. Options related to durabilityOptions related to durability  sync_binlog = 1sync_binlog = 1  innodb_flush_log_at_trx_commit = 1innodb_flush_log_at_trx_commit = 1  innodb_support_xa = 1innodb_support_xa = 1
  43. 43. Benchmark resultsBenchmark results
  44. 44. BenchmarkBenchmark  Server:Server:  Core2Quad Q6600Core2Quad Q6600  CentOS 5.4CentOS 5.4  Single EXPI9301CT(e1000e)Single EXPI9301CT(e1000e)  Single Intel X25-E (write-back cache disabled)Single Intel X25-E (write-back cache disabled)  Schema:Schema:  CREATE TABLE table1 (k varchar(32) KEY, v varchar(32)) engine = INNODB;CREATE TABLE table1 (k varchar(32) KEY, v varchar(32)) engine = INNODB;  Read benchmark:Read benchmark:  10000000 records10000000 records  SELECT v from table1 where k = ?SELECT v from table1 where k = ?  Random accessRandom access  Write benchmark:Write benchmark:  10000000 records10000000 records  UPDATE table SET v = ? where k = ?UPDATE table SET v = ? where k = ?  Random accessRandom access  Durable writeDurable write  sync_binlog = 1sync_binlog = 1  innodb_flush_log_at_trx_commit = 1innodb_flush_log_at_trx_commit = 1  innodb_support_xa = 1innodb_support_xa = 1
  45. 45. Throughput (reads)Throughput (reads) 0 50000 100000 150000 200000 250000 300000 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 # of c onc urre nt c onne c tions queriespersec handle rs oc ke t re ad mys ql re ad handle rs oc ke t write mys ql write
  46. 46. Throughput (writes)Throughput (writes) 1 1 0 1 00 1 000 1 0000 1 00000 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 # o f c o n c u rre n t c o n n e c tio n s queriespersec h a n d le rs o c ke t w rite m y s q l w rite
  47. 47. Maximum response timeMaximum response time 0 10 20 30 40 50 60 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 # of c onc urre nt c onne c tions maxresponsetime(sec) handle rs oc ke t re ad mys ql re ad handle rs oc ke t write mys ql write
  48. 48. Average response timeAverage response time 0 1 2 3 4 5 6 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 # of c onc urre nt c onne c tions averageresponsetime(sec) handle rs oc ke t re ad mys ql re ad handle rs oc ke t write mys ql write
  49. 49. Issues and future plansIssues and future plans
  50. 50. IssuesIssues  Difficult to buildDifficult to build  Requires the source of mysqlRequires the source of mysql  MySQL binary compatibility?MySQL binary compatibility?
  51. 51. Future plansFuture plans  ‘‘where’ clausewhere’ clause  Atomic read-modify-write operationsAtomic read-modify-write operations  SQL support?SQL support?  More language bindingsMore language bindings

×