Your SlideShare is downloading. ×
0
×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

HandlerSocket plugin for MySQL (English)

12,619

Published on

This slide is a translation of http://www.slideshare.net/akirahiguchi/handlersocket-plugin-for-mysql-4664154

This slide is a translation of http://www.slideshare.net/akirahiguchi/handlersocket-plugin-for-mysql-4664154

2 Comments
26 Likes
Statistics
Notes
  • very good. thanks,hehe
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • This slide is a translation of http://www.slideshare.net/akirahiguchi/handlersocket-plugin-for-mysql-4664154
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
12,619
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
273
Comments
2
Likes
26
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. HandlerSocket plugin forHandlerSocket plugin for MySQLMySQL Jun 29, 2010Jun 29, 2010    DeNA Technology Seminar @ YoyogiDeNA Technology Seminar @ Yoyogi IT Platform Dept., System Management DivisionIT Platform Dept., System Management Division DeNA Co.,Ltd.DeNA Co.,Ltd. Akira Higuchi <higuchi dot akira at dena dot jp>Akira Higuchi <higuchi dot akira at dena dot jp>
  • 2. Who am I?Who am I?  Akira Higuchi, Ph.D. in scienceAkira Higuchi, Ph.D. in science  IT Platform Dept., DeNA Co.,Ltd.IT Platform Dept., DeNA Co.,Ltd.  system-wide performance optimizationsystem-wide performance optimization  middleware developmentmiddleware development  The creator of HandlerSocket pluginThe creator of HandlerSocket plugin  Using GNU/Linux since 1993Using GNU/Linux since 1993  Fedora: yum install KoboDeluxeFedora: yum install KoboDeluxe  Debian: apt-get install kobodeluxeDebian: apt-get install kobodeluxe
  • 3. About HandlerSocket pluginAbout HandlerSocket plugin
  • 4. What is HandlerSocket?What is HandlerSocket?  Non-SQL interface for MySQLNon-SQL interface for MySQL
  • 5. What HandlerSocket aimsWhat HandlerSocket aims  Executes simple CRUD operations fastExecutes simple CRUD operations fast  Omit SQL parsingOmit SQL parsing  Combine multiple requests on the server sideCombine multiple requests on the server side  Allows SQL on the same databaseAllows SQL on the same database  Only simple operations can be fasterOnly simple operations can be faster  Seamless migration from SQL queriesSeamless migration from SQL queries
  • 6. HandlerSocket pluginHandlerSocket plugin  Offers a direct and non-SQL interface to MySQLOffers a direct and non-SQL interface to MySQL storage enginesstorage engines  Own TCP/IP listenerOwn TCP/IP listener  Talks a text protocolTalks a text protocol  There is a C++ and a Perl client librariesThere is a C++ and a Perl client libraries  Only works with LinuxOnly works with Linux  The source code is here:The source code is here:  https://github.com/ahiguti/HandlerSocket-Plugin-for-MySQLhttps://github.com/ahiguti/HandlerSocket-Plugin-for-MySQL  More infos on the DeNA Tech BlogMore infos on the DeNA Tech Blog  http://engineer.dena.jp/http://engineer.dena.jp/ (in Japanese)(in Japanese)
  • 7. ConstructionConstruction Handler Interface Innodb MyISAM Other storage engines … SQL Layer Handlersocket Plugin Listener for libmysql libmysql libhsclient Applications mysqld client app
  • 8. Other NoSQL interfaces to MySQLOther NoSQL interfaces to MySQL  mycachedmycached  http://developer.cybozu.co.jp/kazuho/2009/08/mychttp://developer.cybozu.co.jp/kazuho/2009/08/myc  Works with any storage enginesWorks with any storage engines  Talks the memcached protocolTalks the memcached protocol  NDB APINDB API  http://dev.mysql.com/doc/ndbapi/en/index.htmlhttp://dev.mysql.com/doc/ndbapi/en/index.html  Dedicated for the ndbcluster engineDedicated for the ndbcluster engine
  • 9. PerformancePerformance
  • 10. PerformancePerformance 241 009 1 59407 601 91 1 5771 0 50000 1 00000 1 50000 200000 250000 300000 1 c o lu m n 50 c o lu m n s (re q u e s ts /s e c ) h a n d le rs oc ke t lib m ys q l  Handlersocket executes simple read queries 4xHandlersocket executes simple read queries 4x faster than mysqld/libmysqlfaster than mysqld/libmysql  Very effective when many columns are retrievedVery effective when many columns are retrieved  The reason is described laterThe reason is described later
  • 11. Commands supported byCommands supported by HandlerSocket (for reading data)HandlerSocket (for reading data)  In pseudo-SQL...In pseudo-SQL... SELECT f1, .. , fn FROM db.tableSELECT f1, .. , fn FROM db.table WHERE k1, ... , km = v1, ... , vmWHERE k1, ... , km = v1, ... , vm ORDER BY index_i LIMIT offset, limitORDER BY index_i LIMIT offset, limit  (k1, ... , km) are the key fields (or a prefix) of(k1, ... , km) are the key fields (or a prefix) of the index_ithe index_i  =, >=, >, <=, and < can be used for a=, >=, >, <=, and < can be used for a comparatorcomparator
  • 12. Commands supported byCommands supported by HandlerSocket (for modifying data)HandlerSocket (for modifying data)  UPDATE, DELETE, and INSERTUPDATE, DELETE, and INSERT  Does not support transactionsDoes not support transactions  Modifications are recorded to the binaryModifications are recorded to the binary log in the row-based formatlog in the row-based format  Modifications are durableModifications are durable
  • 13. Command exampleCommand example  create table db1.table1 (k int key, v char(20))create table db1.table1 (k int key, v char(20))  insert into db1.table1 values (234, 'foo'), (678, ‘bar’)insert into db1.table1 values (234, 'foo'), (678, ‘bar’) $ telnet localhost 9998 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. P 0 db1 table1 PRIMARY k,v 0 1 0 = 1 234 0 2 234 foo 0 = 1 678 0 2 678 bar opens the PK find k = 234 find k = 678
  • 14. Why fast?Why fast?  No SQL parsingNo SQL parsing low CPU usagelow CPU usage  Executes multiple requests in bulkExecutes multiple requests in bulk low CPU/Disk usagelow CPU/Disk usage  Own client/server protocolOwn client/server protocol small network transmission sizesmall network transmission size
  • 15. Eliminating CPU consumptionEliminating CPU consumption
  • 16. oprofile results – libmysql/mysqldoprofile results – libmysql/mysqld  Executes “SELECT v from table where k = ?” manyExecutes “SELECT v from table where k = ?” many timestimes samples| %| ------------------ 9669940 53.1574 mysqld 4438098 24.3970 vmlinux 1835976 10.0927 libpthread-2.5.so 1680656 9.2389 libc-2.5.so 397970 2.1877 e1000e 89136 0.4900 oprofiled 42881 0.2357 oprofile
  • 17. oprofile results –oprofile results – libmysql/mysqldlibmysql/mysqld samples % symbol name 748022 7.7355 MYSQLparse(void*) 219702 2.2720 my_pthread_fastmutex_lock 205606 2.1262 make_join_statistics(JOIN*, TABLE_LIST*, 198234 2.0500 btr_search_guess_on_hash 180731 1.8690 JOIN::optimize() 177120 1.8317 row_search_for_mysql 171185 1.7703 lex_one_token(void*, void*) 162683 1.6824 alloc_root 131823 1.3632 read_view_open_now 122795 1.2699 mysql_select(THD*, Item***, TABLE_LIST*, 100276 1.0370 open_table(THD*, TABLE_LIST*, st_mem_root*, 99575 1.0297 mem_pool_fill_free_list 96434 0.9973 build_template(row_prebuilt_struct*, THD*,  CPU usage inside mysqldCPU usage inside mysqld
  • 18. oprofile results –oprofile results – libmysql/mysqldlibmysql/mysqld samples % symbol name 204393 4.6054 schedule 118648 2.6734 tcp_sendmsg 115832 2.6099 tcp_recvmsg 106537 2.4005 tcp_v4_rcv 103915 2.3414 tcp_ack 103534 2.3328 system_call 93864 2.1150 dev_queue_xmit 86831 1.9565 __mod_timer 85891 1.9353 tcp_rcv_established 84083 1.8946 .text.task_rq_lock  CPU usage inside the Linux kernelCPU usage inside the Linux kernel
  • 19. oprofile results –oprofile results – libmysql/mysqldlibmysql/mysqld  libmysql/mysqldlibmysql/mysqld  Much CPU time spent in mysqldMuch CPU time spent in mysqld  Parsing SQL is slowParsing SQL is slow  schedule() is called frequentlyschedule() is called frequently
  • 20. oprofile results – HandlerSocketoprofile results – HandlerSocket samples| %| ------------------ 1919039 51.0453 vmlinux 811998 21.5987 mysqld 421215 11.2041 libpthread- 2.5.so 207166 5.5105 e1000e 191566 5.0955 handlersocket.so 188618 5.0171 libc-2.5.so 13622 0.3623 oprofiled 5707 0.1518 oprofile  CPU usage inside MySQL with HandlerSocketCPU usage inside MySQL with HandlerSocket
  • 21. oprofile results – HandlerSocketoprofile results – HandlerSocket samples % symbol name 119684 14.7394 btr_search_guess_on_hash 58202 7.1678 row_search_for_mysql 46946 5.7815 mutex_delay 38617 4.7558 my_pthread_fastmutex_lock 37707 4.6437 buf_page_get_known_nowait 36528 4.4985 rec_get_offsets_func 34625 4.2642 build_template(row_prebuilt_struct*, THD*, TABLE*, 20024 2.4660 row_sel_store_mysql_rec 19347 2.3826 btr_cur_search_to_nth_level 16701 2.0568 row_sel_convert_mysql_key_to_innobase 13343 1.6432 cmp_dtuple_rec_with_match 11381 1.4016 ha_innobase::index_read(unsigned char*, 11176 1.3764 dict_index_copy_types 10762 1.3254 mtr_memo_slot_release 10734 1.3219 ha_innobase::init_table_handle_for_HANDLER()  CPU consumption in mysqldCPU consumption in mysqld
  • 22. oprofile results – HandlerSocketoprofile results – HandlerSocket samples % symbol name 129038 6.7241 tcp_sendmsg 80080 4.1729 tcp_v4_rcv 69658 3.6298 dev_queue_xmit 66171 3.4481 .text.skb_release_data 63316 3.2994 __qdisc_run 60279 3.1411 tcp_recvmsg 59703 3.1111 ip_output 58462 3.0464 .text.skb_release_head_state 48876 2.5469 tcp_ack 48733 2.5394 __alloc_skb 45660 2.3793 ip_queue_xmit 44671 2.3278 tcp_transmit_skb  CPU consumption in the Linux kernelCPU consumption in the Linux kernel
  • 23. oprofile results – HandlerSocketoprofile results – HandlerSocket  HandlerSocketHandlerSocket  Most CPU time is consumed in the kernelMost CPU time is consumed in the kernel  schedule() is not called frequentlyschedule() is not called frequently  Inside mysqld, innodb eats most CPU timeInside mysqld, innodb eats most CPU time
  • 24. Executing multiple requests inExecuting multiple requests in bulkbulk
  • 25. Threading modelThreading model mysqld:mysqld:  Thread per connection (MySQL 5)Thread per connection (MySQL 5)  Thread pooling (MySQL 6?)Thread pooling (MySQL 6?)
  • 26. Threading modelThreading model HandlerSocket:HandlerSocket:  Small number of threadsSmall number of threads  Many connections per threadMany connections per thread  Uses epoll()Uses epoll()  Virtually unlimited number of concurrentVirtually unlimited number of concurrent connectionsconnections  Small memory footprintSmall memory footprint
  • 27. HandlerSocket reader threadHandlerSocket reader thread reads requests from many clients locks the DB, gets a read view executes many requests unlocks the DB returns responses to clients locks/unlocks (1/#conns) times per request handlersocket reader thread
  • 28. HandlerSocket writer threadHandlerSocket writer thread reads requests from many clients locks the DB, begins a transaction executes multiple requests commits, and unlocks the DB returns responses to clients handlersocket writer thread executes multiple ops in a single transaction
  • 29. Write throughputWrite throughput  Condition:Condition:  Durable writeDurable write  sync_binlog = 1sync_binlog = 1  innodb_flush_log_at_trx_commit = 1innodb_flush_log_at_trx_commit = 1  innodb_support_xa = 1innodb_support_xa = 1  Write-back cache with BBU, or SSDWrite-back cache with BBU, or SSD  Throughput:Throughput:  MySQL: up to 1000 qpsMySQL: up to 1000 qps  HandlerSocket: up to 30000 qpsHandlerSocket: up to 30000 qps
  • 30. How HandlerSocket locks tablesHow HandlerSocket locks tables  MyISAM:MyISAM:  Shared-exclusive lockShared-exclusive lock  InnoDB:InnoDB:  Reader threads don’t blockReader threads don’t block  Only one writer thread can be executed at the sameOnly one writer thread can be executed at the same timetime  HandlerSocket requests are deadlock-freeHandlerSocket requests are deadlock-free  Only simple operations are supportedOnly simple operations are supported
  • 31. Client/server protocolClient/server protocol
  • 32. MySQL C/S protocolMySQL C/S protocol write(3, "L0003select column0,column1,column2,column3,column4 fromwrite(3, "L0003select column0,column1,column2,column3,column4 from db_1.table_1 where k=15", 80) = 80db_1.table_1 where k=15", 80) = 80 read(3,read(3, "100100560023def4"100100560023def4 db_1db_177table_1table_177table_1table_177column0column077column0column0fr0<fr0< 00037520000000060033def400037520000000060033def4 db_1db_177table_1table_177table_1table_177column1column177colucolu mn1mn1fr0<00037520000000060043def4fr0<00037520000000060043def4 db_1db_177table_1table_177table_1table_177colcol umn2umn277column2column2fr0<00037520000000060053def4fr0<00037520000000060053def4 db_1db_177table_1table_177tt able_1able_177column3column377column3column3fr0<00037520000000060063def4fr0<00037520000000060063def4 db_1db_1 77table_1table_177table_1table_177column4column477column4column4fr0<0003752000000500737fr0<0003752000000500737 600"0n0010001600"0n0010001 0000100111001001220010013300100144500t37600"0", 16384) =500t37600"0", 16384) = 327327 when the above query is executed... SELECT column0, column1, column2, column3, column4 FROM db_1.table_1 where k = 15
  • 33. HandlerSocket C/S protocolHandlerSocket C/S protocol write(3, "1t=t1t15n", 9) = 9write(3, "1t=t1t15n", 9) = 9 read(3, "0t5t0t1t2t3t4n", 8192) = 14read(3, "0t5t0t1t2t3t4n", 8192) = 14 when an equivalent query is executed using handlersocket... libmysqllibmysql handlersockethandlersocket requestrequest 80 bytes80 bytes 9 bytes9 bytes responseresponse 327 bytes327 bytes 14 bytes14 bytes
  • 34. MySQL C/S protocolMySQL C/S protocol  The strace result shows that MySQL C/SThe strace result shows that MySQL C/S protocol is verboseprotocol is verbose  Result-set metadataResult-set metadata http://forge.mysql.com/wiki/MySQL_Internals_ClientServer_Protocol#Fhttp://forge.mysql.com/wiki/MySQL_Internals_ClientServer_Protocol#F  Result-set metadata become very large if aResult-set metadata become very large if a result-set has many columnsresult-set has many columns  Neither a HANDLER statement nor a server-sideNeither a HANDLER statement nor a server-side prepared statement does not help to avoid thisprepared statement does not help to avoid this problemproblem
  • 35. Client librariesClient libraries
  • 36. libhsclientlibhsclient  Client library for C++Client library for C++
  • 37. Net::HandlerSocketNet::HandlerSocket  Client library for PerlClient library for Perl  Invokes libhsclient via XSInvokes libhsclient via XS my $cli = new Net::HandlerSocket( {host => ‘localhost’, port => 9999}); $cli->open_index(1, ‘db1’, ‘table1’, ‘PRIMARY’, ‘k,v’); my $res = $cli->exec_multi([ [ 1, ‘=‘, [ ’33’ ], 1, 0 ], [ 1, ‘=‘, [ ’44’ ], 1, 0, ‘U’, [ ’44’, ‘hoge’ ] ], [ 1, ‘>=‘, [ ’55’ ], 10, 20 ], ]);
  • 38. Configuration hintsConfiguration hints
  • 39. HandlerSocket configurationHandlerSocket configuration optionsoptions  handlersocket_threads = 16handlersocket_threads = 16  Number of reader threadsNumber of reader threads  Recommended value is the number of logical CPURecommended value is the number of logical CPU  handlersocket_thread_wr = 1handlersocket_thread_wr = 1  Number of writer threadsNumber of writer threads  Recommended value is ... 1Recommended value is ... 1  handlersocket_port = 9998handlersocket_port = 9998  Listening port for reader requestsListening port for reader requests  handlersocket_port_wr = 9999handlersocket_port_wr = 9999  Listening port for writer requestsListening port for writer requests
  • 40. Other configuration optionsOther configuration options  innodb_buffer_pool_sizeinnodb_buffer_pool_size  As large as possibleAs large as possible  innodb_log_file_size, innodb_log_files_in_groupinnodb_log_file_size, innodb_log_files_in_group  As large as possibleAs large as possible  innodb_thread_concurrency = 0innodb_thread_concurrency = 0  open_files_limit = 65535open_files_limit = 65535  Number of file descriptors mysqld can openNumber of file descriptors mysqld can open  HandlerSocket can handle up to 65000 concurrentHandlerSocket can handle up to 65000 concurrent connectionsconnections
  • 41. Other configuration optionsOther configuration options  innodb_adaptive_hash_index = 1innodb_adaptive_hash_index = 1  Adaptive has index is fast, but consumeAdaptive has index is fast, but consume memorymemory
  • 42. Options related to durabilityOptions related to durability  sync_binlog = 1sync_binlog = 1  innodb_flush_log_at_trx_commit = 1innodb_flush_log_at_trx_commit = 1  innodb_support_xa = 1innodb_support_xa = 1
  • 43. Benchmark resultsBenchmark results
  • 44. BenchmarkBenchmark  Server:Server:  Core2Quad Q6600Core2Quad Q6600  CentOS 5.4CentOS 5.4  Single EXPI9301CT(e1000e)Single EXPI9301CT(e1000e)  Single Intel X25-E (write-back cache disabled)Single Intel X25-E (write-back cache disabled)  Schema:Schema:  CREATE TABLE table1 (k varchar(32) KEY, v varchar(32)) engine = INNODB;CREATE TABLE table1 (k varchar(32) KEY, v varchar(32)) engine = INNODB;  Read benchmark:Read benchmark:  10000000 records10000000 records  SELECT v from table1 where k = ?SELECT v from table1 where k = ?  Random accessRandom access  Write benchmark:Write benchmark:  10000000 records10000000 records  UPDATE table SET v = ? where k = ?UPDATE table SET v = ? where k = ?  Random accessRandom access  Durable writeDurable write  sync_binlog = 1sync_binlog = 1  innodb_flush_log_at_trx_commit = 1innodb_flush_log_at_trx_commit = 1  innodb_support_xa = 1innodb_support_xa = 1
  • 45. Throughput (reads)Throughput (reads) 0 50000 100000 150000 200000 250000 300000 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 # of c onc urre nt c onne c tions queriespersec handle rs oc ke t re ad mys ql re ad handle rs oc ke t write mys ql write
  • 46. Throughput (writes)Throughput (writes) 1 1 0 1 00 1 000 1 0000 1 00000 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 # o f c o n c u rre n t c o n n e c tio n s queriespersec h a n d le rs o c ke t w rite m y s q l w rite
  • 47. Maximum response timeMaximum response time 0 10 20 30 40 50 60 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 # of c onc urre nt c onne c tions maxresponsetime(sec) handle rs oc ke t re ad mys ql re ad handle rs oc ke t write mys ql write
  • 48. Average response timeAverage response time 0 1 2 3 4 5 6 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 # of c onc urre nt c onne c tions averageresponsetime(sec) handle rs oc ke t re ad mys ql re ad handle rs oc ke t write mys ql write
  • 49. Issues and future plansIssues and future plans
  • 50. IssuesIssues  Difficult to buildDifficult to build  Requires the source of mysqlRequires the source of mysql  MySQL binary compatibility?MySQL binary compatibility?
  • 51. Future plansFuture plans  ‘‘where’ clausewhere’ clause  Atomic read-modify-write operationsAtomic read-modify-write operations  SQL support?SQL support?  More language bindingsMore language bindings

×