Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Growing in the Wild. The story by
CUBRID Database Developers.
Esen Sagynov (@CUBRID),
NHN Corporation
Service Platform Dev...
Who are we?
• Eugen Stoianovici
– CUBRID Engine Team
– eugen.stoianovici@cubrid.org
• Esen Sagynov @CUBRID
– CUBRID Projec...
Purpose of this presentation
This is what I remember from every presentation that
I’ve attended. Not the details.
1. “Some...
You will learn…
Reasons behind CUBRID development.
What CUBRID has to offe
r. Benefits & advantages.
What we have learnt s...
CUBRID Facts
 RDBMS
 True Open Source @ www.cubrid.org
 Optimized for Web services
 High performance 3-tier architectu...
Reasons Behind CUBRID Development
Japan
30,000+
Web Servers
USA
Korea
China
150+ Web Services
30,000+
Web Servers
Korea Japan
USA
USA
Korea
Korea Japan
iOS & Android
Japan
Oracle, MSSQL,
MySQL, CUBRID,
NoSQL
150+ Web...
Disadvantages of existing solutions
1. High License Cost
1. Over 10,000 servers @ NHN
2. Third-party solution
1. No owners...
Fork or Start from Scratch?
• No full ownership
• Time to learn the
code base
• Fixed architecture
• Understand the
design...
Benefits of in-house solution
1. High License Cost
1. Over 10,000 servers
@ NHN
2. Third-party solution
1. No ownership of...
CUBRID
Stability Performance
Scalability Ease of Use
Goal
• Human vs. DB Errors
• # of customers
• Smart Index Optimizatio...
#1
Performance
Client
Requests
Performance UP!
Types of
Web
Services
Main operations Example
READ > 95% News, Wiki, Blog, etc.
READ:WRITE...
Phase 1
v1.0 ~ 2.0
Phase 2
v8.2.2
Phase 3
v8.4.0
Phase 4
v8.4.1
Phase 5
Apricot
Phase 6
Banana
SELECT
Performance
+
INSERT...
CREATE TABLE forum_posts(
user_id INTEGER,
post_moment INTEGER,
post_text VARCHAR(64)
);
INDEX i_forum_posts_post_moment O...
Random INSERT Performance
• Users
– 100,000 rows prepopulated
• Test
– CUBRID vNext (code name Apricot)
– MySQL 5.5.21
– 4...
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Queriespersecond
CUBRID QPS decrease with DataSet size
Random INSERT Pe...
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0
1,074,219
1,769,130
2,231,016
2,533,965
2,797,236
3,033,198
3,225,9...
0
1000
2000
3000
4000
5000
6000
7000
Queriespersecond
PostgreSQL QPS decrease with DataSet size
Random INSERT Performance
...
Random INSERT Performance
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Queriespersecond
QPS decline over one hour
...
CUBRID Optimizations
Index Features
Reverse Index
Prefix Index
Function Index
Filter Index
Unique Index
Primary Key
Foreig...
Filter Index
• Interesting (open) tickets fit into a very small index.
• No overhead for INSERT/UPDATE
• Very fast results...
QPS Filter vs. Full index
0
1000
2000
3000
4000
5000
6000
7000
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000...
CUBRID Architecture
APICCI, JDBC, ADO.NET, OLEDB, ODBC,
PHP, Perl, Python, Ruby
BrokerQuery Parser Query Optimizer
Query P...
Parameterized Queries & Filter Index
• Will not use partial indexPostgreSQL
• Provides workaround
MS SQL Server
• Less fle...
Query Plan Cache
• Cache a plan for the lifespan of a
driver level prepared statementPostgreSQL
• No query plan cacheMySQL...
Query Plan Cache
Parse SQL
Name
Resolving
Semantic check
Query
Optimize
Query Plan
Query
Execution
Query Execution
without...
Auto Parameterization
SELECT title, component, assignee FROM users
WHERE register_date > ‘2008-01-01’ AND status = ‘open’;...
#2
Scalability
Scalability challenges
• How to synchronize?
– Async
• Load balancing?
– Third-party solution
• Who handles Fail-over?
– A...
HA solutions
DBMS Cost Disk-shared Replication
Consistenc
y
Auto-
Failover
Oracle RAC +++++
Shared
everything
N/A N/A O
MS...
Client
Requests
1. Non-stop 24/7 service uptime
2. No missing data between nodes
Phase 1
v8.1.0
Phase 2
v8.2.x
Phase 4
v8....
N:N Master:Slave
http://www.cubrid.org/cubrid_ha_oscon
1:1 M:S
1:N M:S
1:1:N M:S:R
N:N M:S
N:1 M:S
CUBRID HA: Benefits
• Non-stop maintenance
• Auto Fail-over
• Large Installations are Easy
• Load balancing
• Accurate and...
Database Sharding
• Partitioning
Divide the data between
multiple tables within one
Database Instance
• Sharding
Divide th...
Without Database Sharding
Tbl1 Tbl2 Tbl3
Broker
App
DB
Tbl4
With Database Sharding
Tbl1 Tbl2 Tbl3
Broker
App
DB
Tbl4
CUBRID SHARD
Phase 1
Apricot
Phase 2
Banana
Unlimited
Shards
Data
Rebalancing
Multiple
Shard ID Gen. Algorith
m
Connection...
Sharding: Benefits
• Developer friendly
– Single database view
– No more application logic
– No application changes
• Mult...
#3
Ease of Use
Phase 1
v.8.2.x
Phase 2
v.8.3.x
Phase 4
v8.4.x
Phase 6
Apricot
Oracle MySQL MySQL MySQL,
Oracle
Hierarchical
Query
SQL: 60...
Client
Requests
1. API Support
2. Ease of Migration
3. Usability
Phase
1
v.8.1.x
Phase 2
v.8.3.x
Phase 3
v.8.4.x
Phase 3
A...
MSSQL Win-Back in 2010
Dual
Read/Writer
MS SQL
Application
CUBRID
Read
Write
[Step1] Dual Write
Dual
Read/Writer
MS SQL
Ap...
ORACLE
Enterprise CUBRID
ORACLE
StandardORACLE
StandardORACLE
StandardORACLE
Standard
CUBRID
CUBRID
CUBRID
CUBRID
40 serve...
What we have learnt so far and Where we
are heading to?
What we have learnt so far
• Not easy to break users’ habits.
• Need time.
• Technical support is the key to
acceptance!
•...
CUBRID Deployment in NHN
42 50
60
69
77
82
94
100
107
117
166
181
208
259
273 283
312
326
346
500
0
100
200
300
400
500
0
...
CUBRID
 Stability  Performance
 Scalability  Ease of Use
Achievements
• Human vs. DB Errors
• # of customers
• Smart I...
CUBRID Roadmap
8.4.x
 Performance++
Covering index,
Key limit, Range sc
an
 SQL Compatibility+
70+ new syntax
 HA++
Mon...
CUBRID is Big now.
What can you do?
1. Keep watching it
2. Consider using
3. Discuss, talk, write about CUBRID
4. Support ...
esen.sagynov@nhn.com
kadishmal@gmail.com
eugen.stoianovici@arnia.ro
www.cubrid.org
www.facebook.com/cubrid www.twitter.com...
. . .
• How do CUBRID developers cope with
stress?
– Join MySQL issue tracker ;)
• Want more?
– Follow us to the next room...
Growing in the Wild. The story by CUBRID Database Developers.
Upcoming SlideShare
Loading in …5
×

Growing in the Wild. The story by CUBRID Database Developers.

7,467 views

Published on

The presentation the CUBRID team presented at Russian Internet Technologies Conference in 2012. The presentation covers such questions as *WHY* CUBRID was developed, *WHY* the developers did not fork existing solutions, *WHY* it was necessary to develop a new RDBMS from scratch, and *HOW* CUBRID Database was evolved over the years.

Published in: Technology
  • Be the first to comment

Growing in the Wild. The story by CUBRID Database Developers.

  1. 1. Growing in the Wild. The story by CUBRID Database Developers. Esen Sagynov (@CUBRID), NHN Corporation Service Platform Development Center Monday, April 2, 2012 Eugen Stoianovici, NHN Corporation CUBRID Development Lab
  2. 2. Who are we? • Eugen Stoianovici – CUBRID Engine Team – eugen.stoianovici@cubrid.org • Esen Sagynov @CUBRID – CUBRID Project Manager – esen.sagynov@nhn.com
  3. 3. Purpose of this presentation This is what I remember from every presentation that I’ve attended. Not the details. 1. “Some guys talked about some cool stuff they encountered in applications (don't remember what)” 2. “There's a database that they use for this type of applications, it's open source and saves from a lot of trouble (don't remember what trouble exactly).” 3. “They're really keen on doing things right.”
  4. 4. You will learn… Reasons behind CUBRID development. What CUBRID has to offe r. Benefits & advantages. What we have learnt so fa r. Where we are heading t o.
  5. 5. CUBRID Facts  RDBMS  True Open Source @ www.cubrid.org  Optimized for Web services  High performance 3-tier architecture  Large DB support  High-Availability feature  DB Sharding support  MySQL compatible SQL syntax  ACID Transactions  Online Backup
  6. 6. Reasons Behind CUBRID Development
  7. 7. Japan 30,000+ Web Servers USA Korea China 150+ Web Services
  8. 8. 30,000+ Web Servers Korea Japan USA USA Korea Korea Japan iOS & Android Japan Oracle, MSSQL, MySQL, CUBRID, NoSQL 150+ Web Services
  9. 9. Disadvantages of existing solutions 1. High License Cost 1. Over 10,000 servers @ NHN 2. Third-party solution 1. No ownership of the code base 2. Additional $$$ for customizations 3. Branch tech support is not enough 4. Communication barriers w/ vendors 5. Slow updates & fixes
  10. 10. Fork or Start from Scratch? • No full ownership • Time to learn the code base • Fixed architecture • Understand the design philosophy • Full ownership • Time to develop • Custom more advanced architecture and design
  11. 11. Benefits of in-house solution 1. High License Cost 1. Over 10,000 servers @ NHN 2. Third-party solution 1. No ownership of the code base 2. Additional $$$ for customizations 3. Communication barriers w/ vendors 4. Slow updates & fixes 1. No License Cost 2. Core Technological Asset 1. Complete control of the code base 2. No additional $$$ for customizations 3. No communication barriers 4. Fast updates & fixes 3. Key Storage Technology Skills 1. Grow our developers 2. Export developers 4. New Database Solution Service 1. Provide CUBRID service to other platforms 2. Instant reaction to customer issues 5. Recurring Key Technology 1. High-Availability 2. Sharding 3. Rebalancing 4. Cluster 5. etc.
  12. 12. CUBRID Stability Performance Scalability Ease of Use Goal • Human vs. DB Errors • # of customers • Smart Index Optimizations • Shared Query Caching • Web Optimized Features • Load Balancer • High-Availability w/ auto fail-over • Sharding • Data Rebalancer • Cluster • SQL & API Compatibility • Native Migration Tool • Native GUI DB Management Tools • Monitoring Tools
  13. 13. #1 Performance
  14. 14. Client Requests Performance UP! Types of Web Services Main operations Example READ > 95% News, Wiki, Blog, etc. READ:WRITE = 70:30% SNS, Push services, etc. WRITE > 90% Log monitoring, Analytics.90% of Web Services CRUD WHY? SELECT Fast searching, avoid sequential scan and ORDER BY INSERT Concurrent WRITE performance, reduce I/O, and Fast searching UPDATE Fast searching, improve lock mechanism DELET E Fast searching How & What to improve
  15. 15. Phase 1 v1.0 ~ 2.0 Phase 2 v8.2.2 Phase 3 v8.4.0 Phase 4 v8.4.1 Phase 5 Apricot Phase 6 Banana SELECT Performance + INSERT & DELETE Performance + SELECT Performance ++ INSERT & UPDATE Performance ++ INSERT Performance +++ SELECT Performance ++++ Shared Quer y Plan Cachi ng Space Reusability Improvement Covering Index, Key limit, etc. Memory Buffer Mgmt. Improvement s Filter index, Skip index, etc. Optimize JOINs DB & Index Volume Optimization s API Performance + Windows Performance + TPS 15% 10% 270% 70% Smart Indexing MySQL SELECT performance CUBRID SELECT performance< MySQL INSERT performance CUBRID INSERT performance <
  16. 16. CREATE TABLE forum_posts( user_id INTEGER, post_moment INTEGER, post_text VARCHAR(64) ); INDEX i_forum_posts_post_moment ON forum_posts (post_moment); INDEX i_forum_posts_post_moment_user_id ON forum_posts (post_moment, user_id); Random INSERT Performance SELECT username FROM users WHERE id = ?; INSERT INTO forum_posts(user_id, post_moment, post_text) VALUES (?, ?, ?); UPDATE users SET last_posted = ? WHERE id = ?; CREATE TABLE users( id INTEGER UNIQUE, username VARCHAR(255), last_posted INTEGER, );
  17. 17. Random INSERT Performance • Users – 100,000 rows prepopulated • Test – CUBRID vNext (code name Apricot) – MySQL 5.5.21 – 40 workers – 1 hour – Record QPS every 2 minutes
  18. 18. 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Queriespersecond CUBRID QPS decrease with DataSet size Random INSERT Performance Average = 3685 Max = 4469 Min = 2821
  19. 19. 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 1,074,219 1,769,130 2,231,016 2,533,965 2,797,236 3,033,198 3,225,948 3,399,681 3,568,563 3,723,471 3,873,873 4,015,635 4,157,433 4,289,112 4,432,938 4,570,920 4,706,523 4,838,079 4,978,152 5,118,651 5,270,694 5,419,056 5,546,517 5,675,619 5,809,068 5,941,296 6,073,431 6,201,138 6,334,749 Queriespersecond MySQL QPS decrease with DataSet size Random INSERT Performance Average = 1796 Max = 8951 Min = 1122
  20. 20. 0 1000 2000 3000 4000 5000 6000 7000 Queriespersecond PostgreSQL QPS decrease with DataSet size Random INSERT Performance Average = 594 Max = 6217 Min = 181
  21. 21. Random INSERT Performance 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Queriespersecond QPS decline over one hour MySQL QPS CUBRID QPS PostgreSQL QPS
  22. 22. CUBRID Optimizations Index Features Reverse Index Prefix Index Function Index Filter Index Unique Index Primary Key Foreign Key Query Features Multi-range key limit Index skip scan Skip order by Skip group by Range Scan optimizations Query rewrites Covering Index Descending Index Server level optimizations Log compression Shared Query Plan cache Locking Optimizations Transaction concurrency
  23. 23. Filter Index • Interesting (open) tickets fit into a very small index. • No overhead for INSERT/UPDATE • Very fast results for open tickets CREATE INDEX ON tickets(component, assignee) WHERE status = ‘open’; SELECT title, component, assignee FROM users WHERE register_date > ‘2008-01-01’ AND status = ‘open’;
  24. 24. QPS Filter vs. Full index 0 1000 2000 3000 4000 5000 6000 7000 0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000 4,000,000 4,500,000 5,000,000 5,500,000 6,000,000 6,500,000 7,000,000 7,500,000 8,000,000 8,500,000 9,000,000 9,500,000 10,000,000 Queriespersecond QPS Full Index QPS Filter Index
  25. 25. CUBRID Architecture APICCI, JDBC, ADO.NET, OLEDB, ODBC, PHP, Perl, Python, Ruby BrokerQuery Parser Query Optimizer Query Planer ServerQuery Manager Query Executor Transaction Manager Lock Manager Log Manager Storage Manager File Manager CUBRID
  26. 26. Parameterized Queries & Filter Index • Will not use partial indexPostgreSQL • Provides workaround MS SQL Server • Less flexible, has to be the exact expressionORACLE • “Shared” Query Plan CacheCUBRID SELECT title, component, assignee FROM users WHERE register_date > ? AND status = ?; SELECT name, email FROM users WHERE register_date > ? AND age < ? AND age < 18;
  27. 27. Query Plan Cache • Cache a plan for the lifespan of a driver level prepared statementPostgreSQL • No query plan cacheMySQL • “Shared” Query Plan CacheCUBRID
  28. 28. Query Plan Cache Parse SQL Name Resolving Semantic check Query Optimize Query Plan Query Execution Query Execution without Plan Cache Parse SQL Get Cached Plan Query Execution Query Execution with Plan Cache
  29. 29. Auto Parameterization SELECT title, component, assignee FROM users WHERE register_date > ‘2008-01-01’ AND status = ‘open’; SELECT title, component, assignee FROM users WHERE register_date > ? AND status = ?;
  30. 30. #2 Scalability
  31. 31. Scalability challenges • How to synchronize? – Async • Load balancing? – Third-party solution • Who handles Fail-over? – Application – Third-party solution • Cost?
  32. 32. HA solutions DBMS Cost Disk-shared Replication Consistenc y Auto- Failover Oracle RAC +++++ Shared everything N/A N/A O MS-SQL Cluster +++ Shared everything N/A N/A O MySQL Cluster ++ Shared nothing Log Based Async Sync O MySQL Replication + Third-party Free Shared nothing Statement Based Async O CUBRID Free Shared nothing Log Based Sync Semi-sync Async O
  33. 33. Client Requests 1. Non-stop 24/7 service uptime 2. No missing data between nodes Phase 1 v8.1.0 Phase 2 v8.2.x Phase 4 v8.3.x Phase 5 v8.4.x Phase 6 Apricot Replicatio n HA Support Extended HA features HA Monitoring + Easy Admin Scripts Async Auto Fail-over HA Status Monitoring HA Performance + Reduce Replication Delay Time CUBRID Heartbeat HA + Replica Admin Scripts Read-Write Service during DB maintenanc e Async, Semi-sync, Sy nc Broker Modes (RW, RO)
  34. 34. N:N Master:Slave http://www.cubrid.org/cubrid_ha_oscon 1:1 M:S 1:N M:S 1:1:N M:S:R N:N M:S N:1 M:S
  35. 35. CUBRID HA: Benefits • Non-stop maintenance • Auto Fail-over • Large Installations are Easy • Load balancing • Accurate and reliable Failure detection • Various Master-Slave Configurations: – 3 replication modes – 3 broker modes
  36. 36. Database Sharding • Partitioning Divide the data between multiple tables within one Database Instance • Sharding Divide the data between multiple tables created in separate Database Instances DB X Y Z DB X DB Y DB Z Shard
  37. 37. Without Database Sharding Tbl1 Tbl2 Tbl3 Broker App DB Tbl4
  38. 38. With Database Sharding Tbl1 Tbl2 Tbl3 Broker App DB Tbl4
  39. 39. CUBRID SHARD Phase 1 Apricot Phase 2 Banana Unlimited Shards Data Rebalancing Multiple Shard ID Gen. Algorith m Connection & Stateme nt Pooling Load Balancing HA Support CUBRID, MySQL, Ora cle Support      
  40. 40. Sharding: Benefits • Developer friendly – Single database view – No more application logic – No application changes • Multiple sharding strategies • Native scale-out support • Load balancing • Support for heterogeneous databases
  41. 41. #3 Ease of Use
  42. 42. Phase 1 v.8.2.x Phase 2 v.8.3.x Phase 4 v8.4.x Phase 6 Apricot Oracle MySQL MySQL MySQL, Oracle Hierarchical Query SQL: 60 + PHP: 20 + SQL: 70+ PHP: 20+ Currency SQL LOB, API++ Implicit Typ e Conversion + Usability + Usability+++ RegExpr MSSQL win-ba ck MySQL, Oracle win-bac k: Monitoring system Oracle: Ad s, Shopping Client Requests SQL Compatibility > 90% MySQL SQL Compatibility
  43. 43. Client Requests 1. API Support 2. Ease of Migration 3. Usability Phase 1 v.8.1.x Phase 2 v.8.3.x Phase 3 v.8.4.x Phase 3 Apricot CM CM, CQB, C MT CUNITO R Web manag er CM Monitoring ++ Phase 1 v.8.1.x Phase 2 v.8.2.x Phase 3 v.8.3.x Phase 4 v.8.4.x CCI, JDBC, OL EDB PHP, Python, Ru by ODBC Perl, ADO.N ET
  44. 44. MSSQL Win-Back in 2010 Dual Read/Writer MS SQL Application CUBRID Read Write [Step1] Dual Write Dual Read/Writer MS SQL Application CUBRID ReadWrite [Step2] Dual Write and Read Application CUBRID Read Write [Step3] Win-back Complete • 16 Master/Slave servers and 1 Archive server • DB size:  0.4~0.5 billion/DB, Total 4 billion records  Total 3.2 TB  Total 4,000 ~ 5,000 QPS • Save money for MSSQL License and SAN Storage
  45. 45. ORACLE Enterprise CUBRID ORACLE StandardORACLE StandardORACLE StandardORACLE Standard CUBRID CUBRID CUBRID CUBRID 40 servers 25 servers • DB size:  1.5 ~ 2.0 TB/DB, Total 40 TB  10~100K Inserts per second • Save money for Oracle License and SAN Storage 1 server Oracle Win-Back in 2011 System Monitoring Service
  46. 46. What we have learnt so far and Where we are heading to?
  47. 47. What we have learnt so far • Not easy to break users’ habits. • Need time. • Technical support is the key to acceptance! • Some services don’t deserve Oracle.
  48. 48. CUBRID Deployment in NHN 42 50 60 69 77 82 94 100 107 117 166 181 208 259 273 283 312 326 346 500 0 100 200 300 400 500 0 20 40 60 80 100 120 140 ~2009 2010-1Q 2010-2Q 2010-3Q 2010-4Q 2011-1Q 2011-2Q 2011-3Q 2011-4Q 2012-1Q ∑ services ∑ deployments
  49. 49. CUBRID  Stability  Performance  Scalability  Ease of Use Achievements • Human vs. DB Errors • # of customers • Smart Index Optimizations • Shared Query Caching • Web Optimized Features • Load Balancer • High-Availability w/ auto fail-over • Sharding • Data Rebalancer • Cluster • > 90% MySQL SQL Compatibility • Native Migration Tool • Native GUI DB Management Tools • Monitoring Tools
  50. 50. CUBRID Roadmap 8.4.x  Performance++ Covering index, Key limit, Range sc an  SQL Compatibility+ 70+ new syntax  HA++ Monitoring tools  I18N, L10N 2~3 European charse ts  SQL Compatibility++ Cursor holdability, Mass table UPDATE & DELETE  I18N, L10N+ more charsets  Performance+++  SQL monitoring performance+  SQL Compatibility+++  Table Partitioning Improvements  DB SHARDING+  Performance++++  CURBID Lite  SQL Compatibility++++  DB Monitoring Improvements  Arcus Caching Integrati on
  51. 51. CUBRID is Big now. What can you do? 1. Keep watching it 2. Consider using 3. Discuss, talk, write about CUBRID 4. Support CUBRID in your apps 5. Contribute to CUBRID 6. Provide CUBRID service
  52. 52. esen.sagynov@nhn.com kadishmal@gmail.com eugen.stoianovici@arnia.ro www.cubrid.org www.facebook.com/cubrid www.twitter.com/cubrid
  53. 53. . . . • How do CUBRID developers cope with stress? – Join MySQL issue tracker ;) • Want more? – Follow us to the next room. We’ll have more discussions!

×