Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012
Upcoming SlideShare
Loading in...5
×
 

Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

on

  • 4,865 views

The presentation the CUBRID team presented at Russian HighLoad++ Conference in October, 2012. The presentation covers the topic of Big Data management through Database Sharding. CUBRID open source ...

The presentation the CUBRID team presented at Russian HighLoad++ Conference in October, 2012. The presentation covers the topic of Big Data management through Database Sharding. CUBRID open source RDBMS provides native support for Sharding with load balancing, connection pooling, and auto fail-over features.

Statistics

Views

Total Views
4,865
Views on SlideShare
3,144
Embed Views
1,721

Actions

Likes
4
Downloads
54
Comments
0

8 Embeds 1,721

http://www.cubrid.org 1697
http://www.scoop.it 10
http://www.dzone.com 5
http://www.linkedin.com 3
http://blog.cubrid.org 2
http://cubrid.org 2
http://wiki.cubrid.org 1
http://50.18.180.246 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Self introduction.
  • CUBRID is a fully-feature Relational Database Management System.CUBRID is not a usual open source project backed by a community, but it’s actually backed by the largest IT corporation in South Korea.
  • Today I want to talk about the importance of relational database systems.
  • Nice NoSQL vs. RDBMS discussion on one of the Russian forums http://it-talk.org/post80487.html#p80487
  • In South Korea, Enterprise Business is even more dependent on Oracle database.
  • If you ask companies who operate mission-critical services, they will tell:1) that a relational database system is still the best choice for mission-critical data;2) that service availability is more important than performance;3) that high performance is good, but predictable performance is the king.The fellows at Box.com cloud storage platform also say they choose RDBMS for mission-critical data.
  • We’ve developed Database Sharding in CUBRID!The difference between partitioning and sharding is that with partitioning you can divide the data between multiple tables within one database which have identical schema.But with sharding you divide data between tables located in different databases. Sometimes the database gets so big that mere tables partitioning is not enough, in fact, it will hinder the performance of the entire system. So we’d better add new databases otherwise called Shards.If HA is for READ distribution, Sharding is for WRITE distribution as you can write to different databases simultaneously.This feature is something mostdevelopers dream to have it on Database side rather than on the application layer. Database Sharding doesn’t just simplify the developers’ life, but also improves both the application and database performance.The Application gets rid of the sharding logic.The Database reduces the index size.Win-win!
  • - Talking about the open source RDBMS solutions, MySQL doesn’t provide database sharding out of the box.- Google had to significantly change MySQL replication to make it work similarly. But at the time Sun, the former owner of MySQL didn’t accept Google’s changes, resulting in a fork form mainstream without mainstream support.- Twitter has recently opened their MySQL fork.http://www.oracle.com/technetwork/database/features/availability/300461-132370.pdf
  • SHARD_KEY_MODULAR = 256SHARD_KEY_LIBRARY_NAME = stringSHARD_KEY_FUNCTION_NAME = string
  • No additional SQL parsing because of HINT.
  • Eugen:When I started thinking about this presentation, this is the outcome that I wanted from it:For the experienced guys in the audience this are the thoughtswhat I want you to have at the end of this presentation. I want you to think that:Some guys talked about some cool stuff they encountered in applications (don't remember what)There's a database that they use for this type of applications, it's open source and saves a lot of trouble (don't remember what trouble exactly)They're really keen on doing things rightThis is what I remember from every presentation that I’ve attended. Not the details.So I don’t expect you to remember the technical details. What I want is to grasp the concept of what we will talk about.

Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012 Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012 Presentation Transcript

  • Database Sharding the Right Way:Easy, Reliable, and Open source.
  • • – – – – –
  • Growing in the Wild. The story by CUBRID Database Developers. View on Slideshare http://profyclub.ru/docs/439
  • ••••
  • 
  • ••
  • =Big Business Opportunity
  • - Enterprise - Vendor dependencySQL - Scalability constraints - Common interface - Open SourceNoSQL - Scalable - Non-standard API
  • ••••••••••••••
  • SQLTransactionsNoSQL => NoACIDStandard InterfaceExperts
  • DBMS Worldwide 21,359 23,252 26,701 11.8% Market Korea 349 395 478 17% $MM Ratio 1.6% 1.7% 1.8%70%65%60%55% Korea50% Worldwide45%40% 2009 2010 2011 Source: Gartner, 2012
  • RDBMS is still the best choice for mission-critical data
  • Database Sharding
  • Name Type Requirements Interface DB ETC DBMS w/ - HibernateHibernate shards AS framework Hibernate Java - JVM supportdbShards AS & Middleware MySQL Java, C MiddlewareGizzard (Twitter) Any storage - JVM Java Middleware &Spider for MySQL MySQL Any Storage Engine - CUBRIDCUBRID SHARD Middleware - MySQL Any - Oracle
  • •••• – –•••
  • • – – – – –
  • Is there such RDBMS?
  • CUBRID 9.0
  •      
  • Easy Installation
  • http://www.cubrid.org/downloads
  • • –• –• –
  • SHARD_KEY_MODULAR = 256SHARD_KEY_LIBRARY_NAME = ‘’SHARD_KEY_FUNCTION_NAME = ‘’
  •  id  user_id=  order_no  …
  • int user_get_shard_key(int type, void *val){ int mod = 2; if (val == NULL) { return ERROR_ON_ARGUMENT; } switch(type) { case SHARD_U_TYPE_INT: { int ival; ival = (int) (*(int *)val); return ival % 2; } break; case SHARD_U_TYPE_STRING: return ERROR_ON_MAKE_SHARD_KEY; default: return ERROR_ON_ARGUMENT; } return ERROR_ON_MAKE_SHARD_KEY;}
  • Configuring CUBRID SHARD is very easy!
  • • $> cubrid createdb shard1 $> csql -S -u dba shard1 -c "create user shard password shard123’” $> cubrid server start shard1
  • • $> csql -C -u shard -p shard123 shard1@localhost -c ”CREATE TABLE users (id BIGINT PRIMARY KEY, name VARCHAR(20), age SMALLINT)”
  • $> cubrid shard start@ cubrid shard start ++cubrid shard start: success
  • connectionURL ="jdbc:cubrid:localhost:45511:shard1:shard:shard123:";
  • String query = "SELECT name FROM student WHERE student_no = /*+ shard_key */ ?; ";PrepareStatement query_stmt = connection.prepareStatement(query);query_stmt.setInt(1,100);ResultSet rs = query_stmt.executeQuery();// fetch resultset range key_column (hash result) shard_id min max student_no 0 63 0 student_no 64 127 1 student_no 128 191 2 student_no 192 255 3
  • SELECT name FROM student WHEREstudent_no = /*+ shard_key */ ?; • •
  • How did we tackle the unique ID problem?
  • • – – – – –
  • CUBRID SHARD Performance
  • Description Quantity OS (64bit) / CPU / MEMAgent to generatload and 8 Centos5.3 / xeon 2G-8core / 8GNDrive App SimulatorCUBRID Shard 1 Centos5.3 / xeon 2.27G-16core / 24GCUBRID Broker 1 Centos5.3 / xeon 2.27G-16core / 24GMeta DB 4 Centos5.x / xeon 2.33G-4core / 8GUser DB 1 Centos5.3 / xeon 2.5G-8core / 8G
  • Load Generator Performance 100000 80000 60000 RPS 40000 20000 0 32 64 96 128 160 192 256 320 384 448 512 # of concurrent users Performance trend when load is increased60000 7050000 60 5040000 4030000 3020000 2010000 10 0 0 64 128 192 256 320 proxy cpu RPS metadb TPS Mean Time(ms)
  • - Similar performance until 128 Vuser - When SHARD is not used, 128 Vuser is maximum - In SHARD usage case, when # of Vuser is increase - maximum performance can be achieved as well as shorter response time and lower CPU utilization.64 128 192 256 320 Vuser
  • TPC-C Performance Test
  • • • AWS Xlarge instance – • 7GB RAM • 20 EC2 units – – • Ubuntu 12.04 64-bit – • CUBRID 9.0 (beta) – – no shrading – • MySQL 5.5.28 – • Buffer – • 2.8GB – data_buffer_size • 2.8GB• innodb_pool_size • Default configurations
  • 46 44.18 42.664238 MySQL 5.5.28 CUBRID 9.03430 TPC-C Index
  •         
  • • – –• – –• –
  • What’s next for CUBRID?
  • 
  • www.cubrid.orgEsen SagynovCUBRID Project Manageresen@cubrid.org CUBRID Q&A www.cubrid.org/questions