Easy MySQL Database Sharding with CUBRID SHARD - 2013 Percona


Published on

If you ask companies who operate mission-critical services, they will tell:

1) that a relational database system is still the best choice for mission-critical data;
2) that service availability is more important than performance;
3) that high performance is good, but predictable performance is the king.

This is a fact, and we know it. At NHN we have over 30,000 Web servers that operate over 150 large scale Web and mobile services. At such scale we must know what scales, how to provide high-availability and operate at predictable speed.

At Percona Live MySQL Conference 2013 I will talk about CUBRID SHARD, a universal database sharding solution for CUBRID, MySQL, and Oracle. CUBRID SHARD can be used with a heterogeneous database backend, i.e. some shards can be stored in CUBRID, some in MySQL or even Oracle. At NHN we deploy various combinations: MySQL only, MySQL + Oracle, MySQL + CUBRID, CUBRID only, and Oracle only. I will explain how DBAs can easily configure it, and how we have implemented this feature.

CUBRID SHARD allows to store unlimited number of database shards and distribute data based on modulo, DATETIME, or hash/range calculations. The developers can even feed in their own library to calculate the SHARD_ID using some complicated custom algorithm. At the session I will show how easy it is to setup all this. No need for a third-party management tool. With CUBRID SHARD application developers do not need to modify the application logic to provide data sharding. This is DBAs job as all this is handled by the database system automatically.

CUBRID SHARD is designed to be very efficient. It provides built-in (*) distributed load balancing and (*) connection and statement pooling. At the conference I will present several cases where CUBRID SHARD is deployed as a shard manager and a connection manager, or where it's used as a way for seamless data migration between different systems.

Who should come to the session?

If you run a service which spends money on a database solution, on tools you need to shard databases or manage connections, you should come and learn how CUBRID SHARD can provide your applications native scale-out through single database view.

1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • To keep this presentation simple, I will focus on three main things.
  • Ask your questions as I'm going through presentation.Hold off with longer questions to the endDo not hesitate to talk to me during conferenceFollowup by email.
  • We have researched how other companies manage big data. They still rely on relational databases and manage their data through data sharding. At NHN eventually we do the same.
  • These are the existing sharding solutions.
  • - Talking about the open source RDBMS solutions, MySQL doesn’t provide database sharding out of the box.- Google had to significantly change MySQL replication to make it work similarly. But at the time Sun, the former owner of MySQL didn’t accept Google’s changes, resulting in a fork form mainstream without mainstream support.Twitter has recently opened their MySQL fork.http://www.oracle.com/technetwork/database/features/availability/300461-132370.pdf
  • Spider for MySQL requires to change the storage engine. It’s not an option. We have running service and don’t want to change anything.
  • Application-aware architecture of dbShards is explained here http://www.dbshards.com/blog/2010/09/black-box-vs-application-aware-sharding/.
  • NHN has many large services in production. We don’t want any middleware that we need to add to affect the performance. So this was a critical requirement so that the sharding middleware shouldn’t decrease the overall performance of the service.
  • Spock Proxy and CUBRID SHARD have somewhat similar architecture. Both are lightweight and flexible.
  • No additional SQL parsing because of HINT.
  • Easy MySQL Database Sharding with CUBRID SHARD - 2013 Percona

    1. 1. Easy MySQL Database Shardingwith CUBRID SHARDEsen SagynovApril 24, 2013
    2. 2. Today1. About NHN2. Sharding in Production3. Why CUBRID SHARD4. How to shard MySQL databases5. DEMO6. CUBRID SHARD in Ndrive2
    3. 3. About me3• Esen Sagynov (NHN Corp.)– @CUBRID– fb.com/cubridesen@cubrid.org
    4. 4. About NHN
    5. 5. Sharding in Production• Uses RDBMS with Sharding• Data is stored as simple Key-Value.••••••••••••
    6. 6. Sharding SolutionsName Type Requirements InterfaceDB ETCHibernate shards AS frameworkDBMS w/Hibernatesupport- Hibernate- JVMJavaHiveDB AS framework MySQL- Hibernate- JVMJavadbShards AS & Middleware MySQLJava, C, PHP, Python,RubyGizzard (Twitter) Middleware Any storage - JVM JavaSpider for MySQLMiddleware &Storage EngineMySQL AnySpock Proxy Middleware MySQL AnyShard-Query Middleware MySQL PHP, RESTful APICUBRID SHARD Middleware- CUBRID- MySQL- OracleAny
    7. 7. Sharding Solution Categories• Application layer• Storage layer• Heavy middleware• Lightweight middleware
    8. 8. Application & Storage LayersApplication Layer• Hibernate Shards• HiveDBDisadvantage• Requires Hibernate/Java• Uses many XML files for configuration• Not for running services8Storage Layer• Spider for MySQLDisadvantage• Requires to change storageengine• Not for running services
    9. 9. Heavy MiddlewaredbShards Gizzard9• Requires to change applicationcode• Requires agents to be installed oneach DB server• Not for running services• Not active
    10. 10. Lightweight Middleware• Spock Proxy– Active project– Lightweight– Flexible– Easy to configure– No application change10
    11. 11. Spock ProxySpock ProxySharding rule storage DatabaseSharding strategy ModuloDetermine Sharding Key Full SQL ParsingStrength No need to change SQLWeakness • Performance degradation:• Extra SQL parsing• Resultset merging• Not all MySQL SQL is supported• Single threaded11Blog post: http://www.cubrid.org/blog/dev-platform/database-sharding-platform-at-nhn/
    12. 12. Spock Proxy Performance12• Single threaded• Parses and rewrites SQL01002003004005001 5 10 20 30 50 70 100 200 400App ShardingSpock ProxyCUBRID SHARDConcurrent clientsExec. time
    13. 13. Spock Proxy Active project Lightweight Flexible Easy to configure No application change✕No performance impact
    14. 14. Lightweight, Easy to ConfigureSharding MiddlewareCUBRID SHARD14
    15. 15. Spock Proxy vs. CUBRID SHARDSpock Proxy CUBRID SHARDSharding rule storage Database Configuration fileSharding strategy Modulo • Modulo• User defined hash functionDetermine Sharding Key Full SQL Parsing SQL Hint SearchStrength No need to change SQL • Supports CUBRID and MySQL• Full MySQL SQL support• Higher performance• No SQL parsing• Multi-threaded• Connection pooling• Load balancing• Custom sharding strategy• Easy configurationWeakness • Performance degradation:• Extra SQL parsing• Resultset merging• Supports MySQL only• Not all MySQL SQL is supported• Single threaded• Requires to change SQL queries toinsert the sharding hint15Blog post: http://www.cubrid.org/blog/dev-platform/database-sharding-platform-at-nhn/
    16. 16. CUBRID Facts RDBMS True Open Source @ www.cubrid.org Optimized for Web services High performance Large DB support High-Availability feature DB Sharding support 90+% MySQL compatible SQL syntax + Oracle analyticalfunctions ACID Transactions Online Backup Supported by NHN Corporation
    17. 17. CUBRID SHARD Architecture… ………Single database viewOR
    18. 18. SHARD Environment…………
    19. 19. Installing CUBRID SHARD is easy!
    20. 20. Easy Installationhttp://www.cubrid.org/downloadsapt-getyumchef ⭐VMEC2 AMIcloud serviceDoc page:http://www.cubrid.org/wiki_tutorials/entry/cubrid-installation-instructions
    21. 21. Configuring is very easy and intuitive!
    22. 22. Configuration Steps• Create1. Shards2. Database Users3. Database Schema4. Configure CUBRID SHARD– shard database information– backend shards connection information– sharding strategy5. Start CUBRID SHARD6. Change application code– connection URL– shard hint23CUBRID SHARD
    23. 23. # 1. Create Shards• Host 1..N:$> mysql –ushard -ppassword –hnode1mysql> CREATE DATABASE sharddb;
    24. 24. # 2. Create Users• Host 1..N:$> mysql –ushard -ppassword –hnode1mysql> USE mysql;mysql> GRANT ALL PRIVILEGES ONsharddb@localhost TO shard@localhostIDENTIFIED BY ‘shard123’mysql> GRANT ALL PRIVILEGES ONsharddb@localhost TOshard@shardBrokerNode IDENTIFIED BY‘shard123’
    25. 25. # 3. Create same tables$> mysql –ushard -ppassword –hnode1mysql> USE sharddb;mysql> CREATE TABLE tbl_users (id BIGINTPRIMARY KEY, name VARCHAR(20), ageSMALLINT)$> mysql –ushard -ppassword –hnode2…• Host 1..N:
    26. 26. # 4. Simple Configuration• shard.conf– Main configuration file for CUBRID SHARD.• shard_connection.txt– Predefined list of shard IDs, database and hostnames for CUBRID/MySQL.• shard_keys.txt– A list of shard_key_columns and their mappingwith shard_idDoc page:http://www.cubrid.org/manual/91/en/shard.html#configuration-and-setup
    27. 27. shard.confSet:1. SHARD_DB_NAME2. SHARD_DB_USER3. SHARD_DB_PASSWORD4. APPL_SERVER…SHARD_DB_NAME = sharddbSHARD_DB_USER = shardSHARD_DB_PASSWORD = shard123APPL_SERVER = CAS_MYSQL…Doc page:http://www.cubrid.org/manual/91/en/shard.html#default-configuration-file-shard-conf
    28. 28. shard_connection.txtSet:1. Shard ID2. Real database name3. Remote/local host name# shard-id real-db-name connection-info0 sharddb mysqlA:33061 sharddb mysqlB:33062 sharddb mysqlC:3306…** Host names must be identical to the output ofhostname command of every node.Doc page:http://www.cubrid.org/manual/91/en/shard.html#setting-shard-metadata
    29. 29. shard_keys.txtSet:1. Min shard key2. Max shard key3. Shard ID[%student_no]# min max shard_id0 63 064 127 1128 191 2192 255 3** Default sharding strategy isto apply modulo 256(SHARD_KEY_MODULAR inshard.conf ).Doc page:http://www.cubrid.org/manual/91/en/shard.html#setting-shard-metadata
    30. 30. Custom Libraryint fn_shard_key_udf(int type, void *val){int mod = 2;if (val == NULL){return ERROR_ON_ARGUMENT;}switch(type){case SHARD_U_TYPE_INT:{int ival;ival = (int) (*(int *)val);return ival % 2;}break;case SHARD_U_TYPE_STRING:return ERROR_ON_MAKE_SHARD_KEY;default:return ERROR_ON_ARGUMENT;}return ERROR_ON_MAKE_SHARD_KEY;}shard.conf1. SHARD_KEY_LIBRARY_NAME2. SHARD_KEY_FUNCTION_NAME[%student_no]SHARD_KEY_LIBRARY_NAME=$CUBRID/conf/shard_key_udf.soSHARD_KEY_FUNCTION_NAME=fn_shard_key_udfDoc page:http://www.cubrid.org/manual/91/en/shard.html#setting-user-defined-hash-function
    31. 31. # 5. Start CUBRID SHARD$> cubrid shard start@ cubrid shard start ++cubrid shard start: success
    32. 32. # 6. Connection URLconnectionURL ="jdbc:cubrid:localhost:45511:sharddb:shard:shard123:?althosts=node2:port2,node3:port3&loadBalance=true";
    33. 33. Querying ShardsSELECT name FROM student WHEREstudent_no = /*+ shard_key */ ?;••
    34. 34. Types of SQL Hints
    35. 35. String query = "SELECT name FROM student WHERE student_no = /*+ shard_key */ ?; ";PrepareStatement query_stmt = connection.prepareStatement(query);query_stmt.setInt(1,100);ResultSet rs = query_stmt.executeQuery();// fetch resultsetkey_columnrange(hash result) shard_idmin maxstudent_no 0 63 0student_no 64 127 1student_no 128 191 2student_no 192 255 3
    36. 36. MySQL Sharding DEMORequirements:• 1GB free RAM• 3GB free space for 2 VMs• VirtualBox• Vagrant
    37. 37. MySQL Sharding DEMO39https://github.com/kadishmal/cubrid-shard-demo
    38. 38. CUBRID SHARD• Easy– No configuration hassle– No “moving parts”• Reliable– High performance– No SPOF• Open source– Supported by NHN
    39. 39. CUBRID SHARD DisadvantagesNeed to alter SQL to add HintsNo Data RebalancingNeed to carefully plan the sharding strategy inadvance.No GUI monitoring tool. Only command line.
    40. 40. CUBRID SHARD is great when…• Services are already running and stable• But data is growing fast• And you need a stable solution• Quick installation and easy configuration• Time constraints43
    41. 41. Ndrive cloud storage service• User files meta data• Sharding strategy by user ID• 24 master shards– Intel(R) Xeon(R) L5640 @ 2.27GHz * 8, 16G RAM, 820GHDD• 10TB data• Load pattern:– 75~80% SELECT vs. 20~25% INSERT– Avg. ~3000 QPS/shard– Avg. ~5% CPU load/shard44
    42. 42. Ndrive cloud storage service• 1 SHARD BROKER• 4 Proxies per Broker• 50 CAS per proxy• No performancedegradation after CUBRIDSHARD is used4564 128 192 256 320Vuser
    43. 43. CUBRID SHARD NextAuto-rebalancing in CUBRID SHARDCM shard monitoringAggregation feature
    44. 44. Questions?47• Esen Sagynov (NHN Corp.)– @CUBRID– fb.com/cubridesen@cubrid.org