Database Sharding the Right Way:
Easy, Reliable, and Open source.
•
    –
    –

    –
    –
    –
Growing in the Wild. The story by
 CUBRID Database Developers.

             View on Slideshare

         http://profyclub.ru/docs/439
•
•
•

•












•



•
=
Big Business Opportunity
-   Enterprise
        -   Vendor dependency
SQL
        -   Scalability constraints
        -   Common interface




        - Open Source
NoSQL   - Scalable
        - Non-standard API
•
•

•
•

•
•


•
•

•
•

•
•

•
•
SQL

Transactions

NoSQL => NoACID

Standard Interface

Experts
DBMS     Worldwide   21,359       23,252     26,701       11.8%
 Market   Korea       349          395        478          17%
  $MM     Ratio       1.6%         1.7%       1.8%




70%
65%
60%
55%                                                        Korea
50%                                                        Worldwide
45%
40%
          2009              2010            2011


                                                     Source: Gartner, 2012
RDBMS is still the best choice for
    mission-critical data
Database Sharding
Name               Type              Requirements         Interface
                                      DB          ETC
                                      DBMS w/
                                                  - Hibernate
Hibernate shards    AS framework      Hibernate                 Java
                                                  - JVM
                                      support
dbShards            AS & Middleware   MySQL                     Java, C
                    Middleware
Gizzard (Twitter)                     Any storage - JVM         Java

                    Middleware &
Spider for MySQL                      MySQL                     Any
                    Storage Engine
                                      - CUBRID
CUBRID SHARD        Middleware        - MySQL                   Any
                                      - Oracle
•
•
•
•
    –
    –
•
•
•
•

    –
    –
    –
    –
    –
Is there such RDBMS?
CUBRID 9.0

    
    
    

    

    
•
Easy Installation
http://www.cubrid.org/downloads
•
    –
•
    –

•
    –
SHARD_KEY_MODULAR         = 256
SHARD_KEY_LIBRARY_NAME    = ‘’
SHARD_KEY_FUNCTION_NAME   = ‘’
   id
       user_id
=      order_no
       …
int user_get_shard_key(int type, void *val)
{
    int mod = 2;

    if (val == NULL)
    {
           return ERROR_ON_ARGUMENT;
    }

    switch(type)
    {
           case SHARD_U_TYPE_INT:
           {
               int ival;
               ival = (int) (*(int *)val);
               return ival % 2;
           }
               break;
           case SHARD_U_TYPE_STRING:
               return ERROR_ON_MAKE_SHARD_KEY;
           default:
               return ERROR_ON_ARGUMENT;
    }
    return ERROR_ON_MAKE_SHARD_KEY;
}
Configuring CUBRID SHARD is very easy!
•

    $> cubrid createdb shard1
    $> csql -S -u dba shard1 -c "create user
    shard password 'shard123’”
    $> cubrid server start shard1
•

    $> csql -C -u shard -p 'shard123'
    shard1@localhost -c ”CREATE TABLE users
    (id BIGINT PRIMARY KEY, name VARCHAR(20),
    age SMALLINT)”
$> cubrid shard start
@ cubrid shard start ++
cubrid shard start: success
connectionURL   =
"jdbc:cubrid:localhost:45511:shard1:shard:shard123:";
String query = "SELECT name FROM student WHERE student_no = /*+ shard_key */ ?; ";
PrepareStatement query_stmt = connection.prepareStatement(query);
query_stmt.setInt(1,100);
ResultSet rs = query_stmt.executeQuery();
// fetch resultset




                                                 range
                                key_column    (hash result)   shard_id
                                             min        max
                                student_no    0          63      0
                                student_no   64         127      1
                                student_no   128        191      2
                                student_no   192        255      3
SELECT name FROM student WHERE
student_no = /*+ shard_key */ ?;

                               •
                               •
How did we tackle the
 unique ID problem?
•
    –
    –
    –
    –
    –
CUBRID SHARD Performance
Description       Quantity          OS (64bit) / CPU / MEM

Agent to generat
load and                  8       Centos5.3 / xeon 2G-8core / 8G
NDrive App Simulator
CUBRID Shard              1       Centos5.3 / xeon 2.27G-16core / 24G

CUBRID Broker             1       Centos5.3 / xeon 2.27G-16core / 24G

Meta DB                   4       Centos5.x / xeon 2.33G-4core / 8G

User DB                   1       Centos5.3 / xeon 2.5G-8core / 8G
Load Generator Performance
                                                                     100000
                                                                     80000
                                                                     60000




                                                               RPS
                                                                     40000
                                                                     20000
                                                                           0
                                                                               32   64   96 128 160 192 256 320 384 448 512
                                                                                            # of concurrent users


        Performance trend when load is
                 increased
60000                                                                 70
50000                                                                 60
                                                                      50
40000
                                                                      40
30000
                                                                      30
20000
                                                                      20
10000                                                                 10
    0                                                                 0
        64          128     192          256           320

        proxy cpu     RPS   metadb TPS         Mean Time(ms)
-         Similar performance until 128 Vuser
                               - When SHARD is not used, 128 Vuser is
                                   maximum
                   -         In SHARD usage case, when # of Vuser is
                             increase
                               - maximum performance can be achieved
                                   as well as shorter response time and
                                   lower CPU utilization.




64   128   192         256              320
           Vuser
TPC-C Performance Test
•       • AWS Xlarge instance
    –      • 7GB RAM
           • 20 EC2 units
    –
    –   • Ubuntu 12.04 64-bit
    –
        • CUBRID 9.0 (beta) –
    –     no shrading
    –   • MySQL 5.5.28
    –
        • Buffer
    –      • 2.8GB
    –         data_buffer_size
           • 2.8GB
•             innodb_pool_size
        • Default configurations
46
                 44.18
     42.66
42


38                       MySQL 5.5.28
                         CUBRID 9.0

34


30
        TPC-C Index
   

   
       
    

   
        
   

    
•
    –
    –
•
    –
    –
•
    –
What’s next for CUBRID?




www.cubrid.org



Esen Sagynov
CUBRID Project Manager

esen@cubrid.org                     CUBRID Q&A
                                    www.cubrid.org/questions

Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

  • 1.
    Database Sharding theRight Way: Easy, Reliable, and Open source.
  • 2.
    – – – – –
  • 3.
    Growing in theWild. The story by CUBRID Database Developers. View on Slideshare http://profyclub.ru/docs/439
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
    - Enterprise - Vendor dependency SQL - Scalability constraints - Common interface - Open Source NoSQL - Scalable - Non-standard API
  • 9.
  • 10.
  • 11.
    DBMS Worldwide 21,359 23,252 26,701 11.8% Market Korea 349 395 478 17% $MM Ratio 1.6% 1.7% 1.8% 70% 65% 60% 55% Korea 50% Worldwide 45% 40% 2009 2010 2011 Source: Gartner, 2012
  • 12.
    RDBMS is stillthe best choice for mission-critical data
  • 13.
  • 15.
    Name Type Requirements Interface DB ETC DBMS w/ - Hibernate Hibernate shards AS framework Hibernate Java - JVM support dbShards AS & Middleware MySQL Java, C Middleware Gizzard (Twitter) Any storage - JVM Java Middleware & Spider for MySQL MySQL Any Storage Engine - CUBRID CUBRID SHARD Middleware - MySQL Any - Oracle
  • 16.
    • • • • – – • • •
  • 17.
    – – – – –
  • 18.
  • 19.
  • 20.
          
  • 21.
  • 23.
  • 24.
  • 25.
    – • – • –
  • 26.
    SHARD_KEY_MODULAR = 256 SHARD_KEY_LIBRARY_NAME = ‘’ SHARD_KEY_FUNCTION_NAME = ‘’
  • 28.
    id  user_id =  order_no  …
  • 29.
    int user_get_shard_key(int type,void *val) { int mod = 2; if (val == NULL) { return ERROR_ON_ARGUMENT; } switch(type) { case SHARD_U_TYPE_INT: { int ival; ival = (int) (*(int *)val); return ival % 2; } break; case SHARD_U_TYPE_STRING: return ERROR_ON_MAKE_SHARD_KEY; default: return ERROR_ON_ARGUMENT; } return ERROR_ON_MAKE_SHARD_KEY; }
  • 30.
  • 31.
    $> cubrid createdb shard1 $> csql -S -u dba shard1 -c "create user shard password 'shard123’” $> cubrid server start shard1
  • 32.
    $> csql -C -u shard -p 'shard123' shard1@localhost -c ”CREATE TABLE users (id BIGINT PRIMARY KEY, name VARCHAR(20), age SMALLINT)”
  • 33.
    $> cubrid shardstart @ cubrid shard start ++ cubrid shard start: success
  • 34.
    connectionURL = "jdbc:cubrid:localhost:45511:shard1:shard:shard123:";
  • 35.
    String query ="SELECT name FROM student WHERE student_no = /*+ shard_key */ ?; "; PrepareStatement query_stmt = connection.prepareStatement(query); query_stmt.setInt(1,100); ResultSet rs = query_stmt.executeQuery(); // fetch resultset range key_column (hash result) shard_id min max student_no 0 63 0 student_no 64 127 1 student_no 128 191 2 student_no 192 255 3
  • 36.
    SELECT name FROMstudent WHERE student_no = /*+ shard_key */ ?; • •
  • 38.
    How did wetackle the unique ID problem?
  • 39.
    – – – – –
  • 41.
  • 42.
    Description Quantity OS (64bit) / CPU / MEM Agent to generat load and 8 Centos5.3 / xeon 2G-8core / 8G NDrive App Simulator CUBRID Shard 1 Centos5.3 / xeon 2.27G-16core / 24G CUBRID Broker 1 Centos5.3 / xeon 2.27G-16core / 24G Meta DB 4 Centos5.x / xeon 2.33G-4core / 8G User DB 1 Centos5.3 / xeon 2.5G-8core / 8G
  • 43.
    Load Generator Performance 100000 80000 60000 RPS 40000 20000 0 32 64 96 128 160 192 256 320 384 448 512 # of concurrent users Performance trend when load is increased 60000 70 50000 60 50 40000 40 30000 30 20000 20 10000 10 0 0 64 128 192 256 320 proxy cpu RPS metadb TPS Mean Time(ms)
  • 44.
    - Similar performance until 128 Vuser - When SHARD is not used, 128 Vuser is maximum - In SHARD usage case, when # of Vuser is increase - maximum performance can be achieved as well as shorter response time and lower CPU utilization. 64 128 192 256 320 Vuser
  • 45.
  • 46.
    • AWS Xlarge instance – • 7GB RAM • 20 EC2 units – – • Ubuntu 12.04 64-bit – • CUBRID 9.0 (beta) – – no shrading – • MySQL 5.5.28 – • Buffer – • 2.8GB – data_buffer_size • 2.8GB • innodb_pool_size • Default configurations
  • 47.
    46 44.18 42.66 42 38 MySQL 5.5.28 CUBRID 9.0 34 30 TPC-C Index
  • 48.
                 
  • 49.
    – – • – – • –
  • 50.
  • 51.
  • 53.
    www.cubrid.org Esen Sagynov CUBRID ProjectManager esen@cubrid.org CUBRID Q&A www.cubrid.org/questions

Editor's Notes

  • #3 Self introduction.
  • #6 CUBRID is a fully-feature Relational Database Management System.CUBRID is not a usual open source project backed by a community, but it’s actually backed by the largest IT corporation in South Korea.
  • #8 Today I want to talk about the importance of relational database systems.
  • #11 Nice NoSQL vs. RDBMS discussion on one of the Russian forums http://it-talk.org/post80487.html#p80487
  • #12 In South Korea, Enterprise Business is even more dependent on Oracle database.
  • #13 If you ask companies who operate mission-critical services, they will tell:1) that a relational database system is still the best choice for mission-critical data;2) that service availability is more important than performance;3) that high performance is good, but predictable performance is the king.The fellows at Box.com cloud storage platform also say they choose RDBMS for mission-critical data.
  • #15 We’ve developed Database Sharding in CUBRID!The difference between partitioning and sharding is that with partitioning you can divide the data between multiple tables within one database which have identical schema.But with sharding you divide data between tables located in different databases. Sometimes the database gets so big that mere tables partitioning is not enough, in fact, it will hinder the performance of the entire system. So we’d better add new databases otherwise called Shards.If HA is for READ distribution, Sharding is for WRITE distribution as you can write to different databases simultaneously.This feature is something mostdevelopers dream to have it on Database side rather than on the application layer. Database Sharding doesn’t just simplify the developers’ life, but also improves both the application and database performance.The Application gets rid of the sharding logic.The Database reduces the index size.Win-win!
  • #17 - Talking about the open source RDBMS solutions, MySQL doesn’t provide database sharding out of the box.- Google had to significantly change MySQL replication to make it work similarly. But at the time Sun, the former owner of MySQL didn’t accept Google’s changes, resulting in a fork form mainstream without mainstream support.- Twitter has recently opened their MySQL fork.http://www.oracle.com/technetwork/database/features/availability/300461-132370.pdf
  • #27 SHARD_KEY_MODULAR = 256SHARD_KEY_LIBRARY_NAME = stringSHARD_KEY_FUNCTION_NAME = string
  • #38 No additional SQL parsing because of HINT.
  • #53 Eugen:When I started thinking about this presentation, this is the outcome that I wanted from it:For the experienced guys in the audience this are the thoughtswhat I want you to have at the end of this presentation. I want you to think that:Some guys talked about some cool stuff they encountered in applications (don't remember what)There's a database that they use for this type of applications, it's open source and saves a lot of trouble (don't remember what trouble exactly)They're really keen on doing things rightThis is what I remember from every presentation that I’ve attended. Not the details.So I don’t expect you to remember the technical details. What I want is to grasp the concept of what we will talk about.