Your SlideShare is downloading. ×

CodeFutures - Scaling Your Database in the Cloud


Published on

RightScale Conference Santa Clara 2011: Scaling an application in the cloud often hits the most common bottleneck – the database tier. Not only is database performance the number one cause of poor …

RightScale Conference Santa Clara 2011: Scaling an application in the cloud often hits the most common bottleneck – the database tier. Not only is database performance the number one cause of poor application performance, but also the issue is magnified in cloud environments where I/O and bandwidth is generally slower and less predictable than in dedicated data centers. Database sharding is a highly effective method of removing the database scalability barrier, operating on top of proven RDBMS products such as MySQL and Postgres – as well as the new NoSQL database platforms. One critical aspect often given too little consideration is monitoring and continuous operation of your databases, including the full lifecycle, to ensure that they stay up.

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Scaling your Database inthe CloudPresented by:Cory Isaacson, CEOCodeFutures Corporationhttp://www.dbshards.comWatch the video of this presentation
  • 2. Introduction Who I am  Cory Isaacson, CEO of CodeFutures  Providers of dbShards  Author of Software Pipelines Leaders in scalability, performance, high-availability and database solutions…  …based on real-world experience with dozens of cloud- based and data center hosted applications  …social networking, gaming, data collection, mobile, analytics Objective is to provide useful experience you can apply to scaling (and managing) your database tier…  …especially for high volume applications
  • 3. Challenges of databasescalability Databases grow…  …and get slower Why?  User content, locking  Connection limitations  Index Depth  Table Sizes  I/O bottlenecks Challenges apply regardless of hosting environment
  • 4. Challenges in the cloudenvironment Cloud provides highly attractive service environment  Flexible, scales with need (up or down)  No need for dedicated IT staff, fixed facility costs  Pay-as-you-go model Cloud services occasionally fail  Partial network outages  Server failures…  …by their nature cloud servers are “transient”  Disk volume issues Cloud-based resources are constrained  CPU  I/O Rates…  …the “Cloud I/O Barrier”
  • 5. Typical Application Architecture
  • 6. Scaling your application Scaling Load Balancers is easy  Stateless routing to app server  Can add redundant Load Balancers if needed  DNS round-robin or intelligent routing for larger sites  If one goes down…  …failover to another Scaling Application Servers is easy  Stateless  Add or remove servers as need dictates  If one goes down…  …failover to another
  • 7. Scaling your application Scaling the Database tier is hard  “Stateful”by definition (and necessity…)  Large, integrated data sets…  10s of GBs to TBs (or more…)  Difficult to move, reload  I/O dependent…  …adversely affected by cloud service failures  …and slow cloud I/O  If one goes down…  …ouch!
  • 8. Scaling your application Databases form the “last mile” of true application scalability  Initially simple optimization produces the best result  Implement a follow-on scalability strategy for long-term performance goals…  …plus a high-availability strategy is a must The best time to plan your database scalability strategy is now  don‟t wait until it‟s a “3-alarm fire”
  • 9. More Database scalabilitychallenges Databases have many other challenges that limit scalability  ACID compliance…  …especially Consistency  …user contention  Preserving Data Relationships…  Referential Integrity  Operational challenges  Failover  Planned, unplanned  Maintenance  Index rebuild  Restore  Space reclamation  Lifecycle  Reliable Backup/Restore  Monitoring  Application Updates  Management
  • 10. FamilyBuilder Innovator in Facebook applications Among first 500 apps worldwide 50MM Users, 100,000 new users/day
  • 11. All CPUs wait at the samespeed… The I/O Barrier
  • 12. Database slowdown is notlinear… Database Load Curve 10000 9000 8000 7000 Load Time 6000 5000 4000 Time 3000 Expon. (Time) 2000 1000 0 0 10 20 30 40 Data File (GB) GB Load Time (Min) .9 1 1.3 2.5 3.5 11.7 39.0 10 days…
  • 13. Challenges apply to all types ofdatabases Traditional RDBMS (MySQL, Postgres, Oracle…)  I/O bound  Multi-user, lock contention  High-availability  Lifecycle management NoSQL, “NewSQL”, Column Databases, Caching…  Reliability  Limits of a single server  …and a single thread  Data dumps to disk  High-availability  Lifecycle Management
  • 14. Challenges apply to all types ofdatabases No matter what the technology, big databases are hard to manage  elasticscaling is a real challenge  degradation from growth in size and volume is a certainty  application-specific database requirements add to the challenge Sound database design is key…  balance performance vs. convenience vs. data size
  • 15. The Laws of Databases Law #1: Small Databases are fast… Law #2: Big Databases are slow… Law #3: Keep databases small
  • 16. What is the answer? Database sharding is the only effective method for achieving scale, elasticity, reliability and easy management…  …regardless of your database technology
  • 17. What is Database Sharding? “Horizontal partitioning is a database design principle whereby rows of a database table are held separately... Each partition forms part of a shard, which may in turn be located on a separate database server or physical location.” Wikipedia
  • 18. What is Database Sharding? Start with a big monolithic database  break it into smaller databases  across many servers  using a key value
  • 19. The key to DatabaseSharding…
  • 20. Database Sharding Architecture
  • 21. Database Sharding Architecture
  • 22. Database Sharding… theresults
  • 23. Why does Database Shardingwork? Maximize CPU/Memory per database instance  as compared to database size Reduce the size of index trees  speeds writes dramatically  reads are faster too  aggregate, list queries are generally much faster No contention between servers  locking, disk, memory, CPU Allows for intelligent parallel processing  Go Fish queries across shards Keep CPUs busy and productive
  • 24. Breaking the I/O Barrier
  • 25. Database types Traditional RDBMS  Proven  SQL language…  …although ORM can change that  Durable, transactional, dependable…  …but inflexible NoSQL/NewSQL  In-memory, memory-cache to disk, single-threaded…  …the real speed advantage  Various flavors…  Key/Value Store  Document database  Hybrid
  • 26. Black box vs. RelationalSharding Both utilize sharding on a data key…  …typically modulus on a value or consistent hash Black box sharding is automatic…  …attempts to evenly shard across all available servers  …no developer visibility or control  …can work acceptably for simple, non-indexed NoSQL data stores  …easily supports single-row/object results Relational sharding is defined by the developer…  …selective sharding of large data  …explicit developer control and visibility  …tunable as the database grows and matures  …more efficient for result sets and searchable queries
  • 27. How Database Sharding works forNoSQL
  • 28. How Relational Sharding works Primary Shard Table Shard Child Tables Global Tables
  • 29. How Relational Sharding works
  • 30. How Relational Sharding works Shard key recognition in SQL  SELECT * FROM customer WHERE customer_id = 1234  INSERT INTO customer (customer_id, first_name, last_name, addr_line1, …) VALUES (2345, „John‟, „Jones‟, „123 B Street‟,…)  UPDATE customer SET addr_line1 = „456 C Avenue‟ WHERE customer_id = 4567
  • 31. What about Multi-Shard resultsets?
  • 32. Multi-shard result set example Go Fish (no shard key)  SELECT country_id, count(*) FROM customer GROUP BY country_id
  • 33. More on Multi-Shard resultsets… Black Box approach requires “scatter gather” for multi-row/object result set…  …common with NoSQL engines  …forces use of denormalized lists  …must be maintained by developer/application code  …Map Reduce processing helps with this (non- realtime) Relational Sharding provides access to meaningful result sets of related data…  …aggregation, sort easier to perform  …logical search operations more natural
  • 34. The Database ShardingProcess
  • 35. What about High-Availability? Can you afford to take your databases offline:  …for scheduled maintenance?  …for unplanned failure?  …can you accept some lost transactions? By definition Database Sharding adds failure points to the data tier A proven High-Availability strategy is a must…  …system outages…especially in the cloud  …planned maintenance…a necessity for all database engines Disaster-recovery…  …multi-region  …multi-cloud
  • 36. Database Sharding…elasticshards
  • 37. Database Sharding summary Database Sharding is the most effective tool for scaling your database tier in the cloud…  …or anywhere Use Database Sharding for any type of engine…  …SQL  …NoSQL  …Column Db  …Cache Use the best engine for a given application requirement…  …avoid the “one-size-fits-all” trap  …each engine has its strengths to capitalize on Ensure your High-Availability is proven and bulletproof…  …especially critical on the cloud  …must support failure and maintenance
  • 38. Questions/Answers Cory Isaacson CodeFutures Corporation cory.isaacson@codefutures.c om