Scaling your Database in the Cloud<br />Fall 2010<br />Presented by:<br />Cory Isaacson, CEO<br />CodeFutures Corporation<...
Introduction<br />Who I am<br />Cory Isaacson, CEO of CodeFutures<br />Providers of dbShards<br />Author of Software Pipel...
Challenges of cloud computing<br />Cloud provides highly attractive service environment<br />Flexible, scales with need (u...
Typical Application Architecture<br />
Scaling in the Cloud<br />Scaling Load Balancers is easy<br />Stateless routing to app server<br />Can add redundant Load ...
Scaling in the Cloud<br />Scaling the Database tier is hard<br />“Stateful” by definition (and necessity…)<br />Large, int...
All CPUs wait at the same speed…<br />The Cloud I/O Barrier<br />
More Database scalability challenges<br />Databases have many other challenges that limit scalability<br />ACID compliance...
Database slowdown is not linear…<br />
Challenges apply to all types of databases<br />Traditional RDBMS (MySQL, Postgres, Oracle…)<br />I/O bound<br />Multi-use...
The Laws of Databases<br />Law #1: Small Databases are fast…<br />Law #2: Big Databases are slow…<br />Law #3: Keep databa...
What is the answer?<br />Database sharding is the only effective method for achieving scale, elasticity, reliability and e...
What is Database Sharding?<br />“Horizontal partitioning is a database design principle whereby rows of a database table a...
The key to Database Sharding…<br />
Database Sharding Architecture<br />
Database Sharding Architecture<br />
Database Sharding… the results<br />
Why does Database Sharding work?<br />Maximize CPU/Memory per database instance…<br />…as compared to database size<br />N...
Breaking the Cloud I/O Barrier<br />
Database types<br />Traditional RDBMS<br />Proven<br />SQL language…<br />…although ORM can change that<br />Durable, tran...
Black box vs. Application-Aware Sharding<br />Both utilize sharding on a data key…<br />…typically modulus on a value or c...
How Database Sharding works for NoSQL<br />
How Database Sharding works for SQL<br />Primary Shard Table<br />Shard Child Tables<br />Global Tables<br />
How Database Sharding works for SQL<br />
What about Cross-Shard result sets?<br />
More on Cross-Shard result sets…<br />Black Box approach requires “scatter gather” for multi-row/object result set…<br />…...
What about High-Availability?<br />Can you afford to take your databases offline:<br />…for scheduled maintenance? <br />…...
Traditional asynch replication is inadequate…<br /><ul><li>Replication “Lag” can result in lost transactions
No rapid failover for continuous operation
Maintenance difficult in production environments
Used by many NoSQL engines</li></li></ul><li>Database Sharding and High-Availability<br /><ul><li>Need a minimum of 2 acti...
…3 active-active instances for in-memory databases
Support for…
…fail-down
Upcoming SlideShare
Loading in …5
×

Scaling SQL and NoSQL Databases in the Cloud

3,650 views

Published on

Cory Isaacson, CEO at CodeFutures, led this session at the RightScale User Conference 2010 in Santa Clara.

Session Abstract: Database performance is the number one cause of poor application performance. Exacerbating the problem are cloud environments where I/O and bandwidth are generally slower and less predictable than in dedicated data centers. Database sharding is a highly effective method of removing the database scalability barrier, operating on top of proven RDBMS products such as MySQL and Postgres as well as the new NoSQL database platforms. In this session, you'll learn what it really takes to implement sharding, reliability, and end-to-end lifecycle management for your entire database environment. We'll present an example of seamlessly combining SQL and NoSQL databases into a unified infrastructure along with a case study that shows how effective database sharding allowed a high-volume social networking application to scale in the cloud.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,650
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
93
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • Writes are linear, reads can be faster – depending on your database architecture.
  • How did your database perform 6 months or a year ago?
  • NoSQL is attractive due to simple API, ease of use. But beware, you really need a solid data design for exactly what you want to do. Relationships are not handled by the database, they are handled by the developer.
  • Difference between Black box sharding/NoSQL and App Aware sharding with an RDBMS – you can get sets of related data from the same shard, otherwise need to retrieve a row at a time and consolidate
  • Scaling SQL and NoSQL Databases in the Cloud

    1. 1. Scaling your Database in the Cloud<br />Fall 2010<br />Presented by:<br />Cory Isaacson, CEO<br />CodeFutures Corporation<br />http://www.dbshards.com<br />
    2. 2. Introduction<br />Who I am<br />Cory Isaacson, CEO of CodeFutures<br />Providers of dbShards<br />Author of Software Pipelines<br />Partnerships:<br />Rightscale<br />The leading Cloud Management Platform<br />myCloudWatcher<br />Provider of cloud monitoring and management services<br />Leaders in scalability, performance, high-availability and database solutions for the cloud…<br />…based on real-world experience with dozens of cloud-based applications<br />…social networking, gaming, data collection, mobile, analytics<br />Objective is to provide useful experience you can apply to scaling (and managing) your database tier…<br />…especially for high volume applications<br />
    3. 3. Challenges of cloud computing<br />Cloud provides highly attractive service environment<br />Flexible, scales with need (up or down)<br />No need for dedicated IT staff, fixed facility costs<br />Pay-as-you-go model<br />Cloud services occasionally fail<br />Partial network outages<br />Server failures…<br />…by their nature cloud servers are “transient”<br />Disk volume issues<br />Cloud-based resources are constrained<br />CPU<br />I/O Rates…<br />…the “Cloud I/O Barrier”<br />
    4. 4. Typical Application Architecture<br />
    5. 5. Scaling in the Cloud<br />Scaling Load Balancers is easy<br />Stateless routing to app server<br />Can add redundant Load Balancers if needed<br />DNS round-robin or intelligent routing for larger sites<br />If one goes down…<br />…failover to another<br />Scaling Application Servers is easy<br />Stateless<br />Add or remove servers as need dictates<br />If one goes down…<br />…failover to another<br />
    6. 6. Scaling in the Cloud<br />Scaling the Database tier is hard<br />“Stateful” by definition (and necessity…)<br />Large, integrated data sets…<br />10s of GBs to TBs (or more…)<br />Difficult to move, reload<br />I/O dependent…<br />…adversely affected by cloud service failures<br />…and slow cloud I/O<br />If one goes down…<br />…ouch!<br />Databases form the “last mile” of true application scalability<br />Initially simple optimization produces the best result<br />Implement a follow-on scalability strategy for long-term performance goals…<br />…plus a high-availability strategy is a must<br />
    7. 7. All CPUs wait at the same speed…<br />The Cloud I/O Barrier<br />
    8. 8. More Database scalability challenges<br />Databases have many other challenges that limit scalability<br />ACID compliance…<br />…especially Consistency<br />…user contention<br />Operational challenges<br />Failover<br />Planned, unplanned<br />Maintenance<br />Index rebuild<br />Restore<br />Space reclamation<br />Lifecycle<br />Reliable Backup/Restore<br />Monitoring<br />Application Updates<br />Management<br />
    9. 9. Database slowdown is not linear…<br />
    10. 10. Challenges apply to all types of databases<br />Traditional RDBMS (MySQL, Postgres, Oracle…)<br />I/O bound<br />Multi-user, lock contention<br />High-availability<br />Lifecycle management<br />In-memory Databases (NoSQL, Caching, Specialty…)<br />Reliability<br />Limits of a single server<br />…and a single thread<br />Data dumps to disk<br />High-availability<br />Lifecycle Management<br />No matter what the technology, big databases are hard to manage…<br />…elastic scaling is a real challenge<br />…degradation from growth in size and volume is a certainty<br />Application-specific database requirements add to the challenge<br />…database design is key…<br />…balance performance vs. convenience vs. data size <br />
    11. 11. The Laws of Databases<br />Law #1: Small Databases are fast…<br />Law #2: Big Databases are slow…<br />Law #3: Keep databases small<br />
    12. 12. What is the answer?<br />Database sharding is the only effective method for achieving scale, elasticity, reliability and easy management…<br />…regardless of your database technology<br />
    13. 13. What is Database Sharding?<br />“Horizontal partitioning is a database design principle whereby rows of a database table are held separately... Each partition forms part of a shard, which may in turn be located on a separate database server or physical location.” Wikipedia<br />
    14. 14. The key to Database Sharding…<br />
    15. 15. Database Sharding Architecture<br />
    16. 16. Database Sharding Architecture<br />
    17. 17. Database Sharding… the results<br />
    18. 18. Why does Database Sharding work?<br />Maximize CPU/Memory per database instance…<br />…as compared to database size<br />No contention between servers<br />Locking, disk, memory, CPU<br />Allows for intelligent parallel processing…<br />…Go Fish queries across shards<br />Keep CPUs busy and productive<br />
    19. 19. Breaking the Cloud I/O Barrier<br />
    20. 20. Database types<br />Traditional RDBMS<br />Proven<br />SQL language…<br />…although ORM can change that<br />Durable, transactional, dependable…<br />…but inflexible<br />NoSQL<br />Typically in-memory…<br />…the real speed advantage<br />Various flavors…<br />Key/Value Store<br />Document database<br />Hybrid<br />
    21. 21. Black box vs. Application-Aware Sharding<br />Both utilize sharding on a data key…<br />…typically modulus on a value or consistent hash<br />Black box sharding is automatic…<br /> …attempts to evenly shard across all available servers<br />…no developer visibility or control<br />…can work acceptably for simple, non-indexed NoSQL data stores<br />…easily supports single-row/object results<br />Application-Aware sharding is defined by the developer…<br />…selective sharding of large data<br />…explicit developer control and visibility<br />…tunable as the database grows and matures<br />…more efficient for result sets and searchable queries<br />
    22. 22. How Database Sharding works for NoSQL<br />
    23. 23. How Database Sharding works for SQL<br />Primary Shard Table<br />Shard Child Tables<br />Global Tables<br />
    24. 24. How Database Sharding works for SQL<br />
    25. 25. What about Cross-Shard result sets?<br />
    26. 26. More on Cross-Shard result sets…<br />Black Box approach requires “scatter gather” for multi-row/object result set…<br />…common with NoSQL engines<br />…forces use of denormalized lists<br />…must be maintained by developer/application code<br />…Map Reduce processing helps with this (non-realtime)<br />Application-Aware provides access to meaningful result sets of related data…<br />…aggregation, sort easier to perform<br />…logical search operations more natural<br />
    27. 27. What about High-Availability?<br />Can you afford to take your databases offline:<br />…for scheduled maintenance? <br />…for unplanned failure? <br />…can you accept some lost transactions?<br />By definition Database Sharding adds failure points to the data tier<br />A proven High-Availability strategy is a must…<br />…system outages…especially in the cloud<br />…planned maintenance…a necessity for all database engines<br />
    28. 28. Traditional asynch replication is inadequate…<br /><ul><li>Replication “Lag” can result in lost transactions
    29. 29. No rapid failover for continuous operation
    30. 30. Maintenance difficult in production environments
    31. 31. Used by many NoSQL engines</li></li></ul><li>Database Sharding and High-Availability<br /><ul><li>Need a minimum of 2 active-active instances per shard…
    32. 32. …3 active-active instances for in-memory databases
    33. 33. Support for…
    34. 34. …fail-down
    35. 35. …failover
    36. 36. …maintenance
    37. 37. …automated backup/restore
    38. 38. …monitoring</li></li></ul><li>Database Sharding…elastic shards<br />
    39. 39. Database Sharding…elastic shards<br />Expand the number of shards…<br />…divide a single shard into N new shards<br />Contract the number of shards…<br />…consolidate N shards into a single shard<br />Due to large data sizes, this takes time…<br />…regardless of data architecture<br />Requires High-Availability to ensure no downtime…<br />…perform scaling on live replica<br />
    40. 40. Database Sharding…the future<br />Ability to leverage proven database engines…<br />…SQL<br />…NoSQL<br />…Caching<br />Allow developers to select the best database engine for a given set of application requirements…<br />…seamless context-switching within the application<br />…use the API of choice<br />Improved management…<br />…monitoring<br />…configuration “on-the-fly”<br />…dynamic elastic shards based on demand<br />
    41. 41. Database Sharding summary<br />Database Sharding is the most effective tool for scaling your database tier in the cloud…<br />…or anywhere<br />Use Database Sharding for any type of engine…<br />…SQL<br />…NoSQL<br />…Cache<br />Use the best engine for a given application requirement…<br />…avoid the “one-size-fits-all” trap<br />…each engine has its strengths to capitalize on<br />Ensure your High-Availability is proven and bulletproof…<br />…especially critical on the cloud<br />…must support failure and maintenance<br />
    42. 42. Questions/Answers<br />Cory Isaacson<br />CodeFutures Corporation<br />cory.isaacson@codefutures.com<br />http://www.dbshards.com<br />

    ×