Scaling SQL and NoSQL Databases in the Cloud
 

Like this? Share it with your network

Share

Scaling SQL and NoSQL Databases in the Cloud

on

  • 2,653 views

Database performance is the number-one cause of poor performance for scalable web applications, and the problem is magnified in cloud environments where I/O and bandwidth are generally slower and less ...

Database performance is the number-one cause of poor performance for scalable web applications, and the problem is magnified in cloud environments where I/O and bandwidth are generally slower and less predictable than in dedicated data centers. Database sharding is a highly effective method of removing the database scalability barrier by operating on top of proven RDBMS products such as MySQL and Postgres as well as the new NoSQL database platforms. In this session, you'll learn what it really takes to implement sharding, the role it plays in the effective end-to-end lifecycle management of your entire database environment, why it is crucial for ensuring reliability, and how to choose the best technology for a specific application. We'll also share a case study on a high-volume social networking application that demonstrates the effectiveness of database sharding for scaling cloud-based applications.

Statistics

Views

Total Views
2,653
Views on SlideShare
2,572
Embed Views
81

Actions

Likes
4
Downloads
69
Comments
2

4 Embeds 81

http://anamzahid.wordpress.com 33
http://www.rightscale.com 31
http://paper.li 15
http://localhost 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Writes are linear, reads can be faster – depending on your database architecture.
  • How did your database perform 6 months or a year ago?
  • NoSQL is attractive due to simple API, ease of use. But beware, you really need a solid data design for exactly what you want to do. Relationships are not handled by the database, they are handled by the developer.
  • Difference between Black box sharding/NoSQL and App Aware sharding with an RDBMS – you can get sets of related data from the same shard, otherwise need to retrieve a row at a time and consolidate

Scaling SQL and NoSQL Databases in the Cloud Presentation Transcript

  • 1. Scaling your Database in the Cloud
    Spring2011
    Presented by:
    Cory Isaacson, CEO
    CodeFutures Corporation
    http://www.dbshards.com
    View the video of this presentation here!
  • 2. Introduction
    Who I am
    Cory Isaacson, CEO of CodeFutures
    Providers of dbShards
    Author of Software Pipelines
    Partnerships:
    Rightscale
    The leading Cloud Management Platform
    Leaders in database scalability, performance, and high-availability for the cloud…
    …based on real-world experience with dozens of cloud-based applications
    …social networking, gaming, data collection, mobile, analytics
    Objective is to provide useful experience you can apply to scaling (and managing) your database tier…
    …especially for high volume applications
  • 3. Challenges of cloud computing
    Cloud provides highly attractive service environment
    Flexible, scales with need (up or down)
    No need for dedicated IT staff, fixed facility costs
    Pay-as-you-go model
    Cloud services occasionally fail
    Partial network outages
    Server failures…
    …by their nature cloud servers are “transient”
    Disk volume issues
    Cloud-based resources are constrained
    CPU
    I/O Rates…
    …the “Cloud I/O Barrier”
  • 4. Typical Application Architecture
  • 5. Scaling in the Cloud
    Scaling Load Balancers is easy
    Stateless routing to app server
    Can add redundant Load Balancers if needed
    DNS round-robin or intelligent routing for larger sites
    If one goes down…
    …failover to another
    Scaling Application Servers is easy
    Stateless
    Sessions can easily transition to another server
    Add or remove servers as need dictates
    If one goes down…
    …failover to another
  • 6. Scaling in the Cloud
    Scaling the Database tier is hard
    “Stateful” by definition (and necessity…)
    Large, integrated data sets…
    10s of GBs to TBs (or more…)
    Difficult to move, reload
    I/O dependent…
    …adversely affected by cloud service failures
    …and slow cloud I/O
    If one goes down…
    …ouch!
    Databases form the “last mile” of true application scalability
    Initially simple optimization produces the best result
    Implement a follow-on scalability strategy for long-term performance goals…
    …plus a high-availability strategy is a must
  • 7. All CPUs wait at the same speed…
    The Cloud I/O Barrier
  • 8. More Database scalability challenges
    Databases have many other challenges that limit scalability
    ACID compliance…
    …especially Consistency
    …user contention
    Operational challenges
    Failover
    Planned, unplanned
    Maintenance
    Index rebuild
    Restore
    Space reclamation
    Lifecycle
    Reliable Backup/Restore
    Monitoring
    Application Updates
    Management
  • 9. Database slowdown is not linear…
  • 10. Challenges apply to all types of databases
    Traditional RDBMS (MySQL, Postgres, Oracle…)
    I/O bound
    Multi-user, lock contention
    High-availability
    Lifecycle management
    NoSQLDatabases In-memory, Caching, Document…)
    Reliability, High-availability
    Limits of a single server
    …and a single thread
    Data dumps to disk
    Lifecycle Management
    No matter what the technology, big databases are hard to manage…
    …elastic scaling is a real challenge
    …degradation from growth in size and volume is a certainty
    Application-specific database requirements add to the challenge
    …database design is key…
    …balance performance vs. convenience vs. data size
  • 11. The Laws of Databases
    Law #1: Small Databases are fast…
    Law #2: Big Databases are slow…
    Law #3: Keep databases small
  • 12. What is the answer?
    Database sharding is the only effective method for achieving scale, elasticity, reliability and easy management…
    …regardless of your database technology
  • 13. What is Database Sharding?
    “Horizontal partitioning is a database design principle whereby rows of a database table are held separately... Each partition forms part of a shard, which may in turn be located on a separate database server or physical location.” Wikipedia
  • 14. The key to Database Sharding…
  • 15. Database Sharding Architecture
  • 16. Database Sharding Architecture
  • 17. Database Sharding… the results
  • 18. Why does Database Sharding work?
    Maximize CPU/Memory per database instance…
    …as compared to database size
    Reduce the size of index trees
    Speeds writes dramatically
    No contention between servers
    Locking, disk, memory, CPU
    Allows for intelligent parallel processing…
    …Go Fish queries across shards
    Keep CPUs busy and productive
  • 19. Breaking the Cloud I/O Barrier
  • 20. Database types
    Traditional RDBMS
    Proven
    SQL language…
    …although ORM can change that
    Durable, transactional, dependable…
    …perceived as inflexible
    NoSQL
    Typically in-memory…
    …the real speed advantage
    Various flavors…
    Key/Value Store
    Document database
    Hybrid
    Store complex documents using a key
    In-memory and disk based
  • 21. Black box vs. Relational Sharding
    Both utilize sharding on a data key…
    …typically modulus on a value or consistent hash
    Black box sharding is automatic…
    …attempts to evenly shard across all available servers
    …no developer visibility or control
    …can work acceptably for simple, non-indexed NoSQL data stores
    …easily supports single-row/object results
    Relational sharding is defined by the developer…
    …selective sharding of large data
    …explicit developer control and visibility
    …tunable as the database grows and matures
    …more efficient for result sets and searchable queries
    …preserves data relationships
  • 22. How Database Sharding works for NoSQL
  • 23. Relational Sharding
    Shard-Tree Root Table
    Shard-Tree Child Tables
    Global Tables
  • 24. How Relational Sharding works
  • 25. How Relational Sharding works
    Shard key recognition in SQL
    SELECT * FROM customerWHERE customer_id = 1234
    INSERT INTO customer(customer_id, first_name, last_name, addr_line1,…)VALUES(2345, ‘John’, ‘Jones’, ‘123 B Street’,…)
    UPDATE customerSET addr_line1 = ‘456 C Avenue’WHERE customer_id = 4567
  • 26. What about Cross-Shard result sets?
  • 27. Cross-shard result set example
    Go Fish (no shard key)
    SELECT country_id, count(*) FROM customerGROUP BY country_id
  • 28. More on Cross-Shard result sets…
    Black Box approach requires “scatter gather” for multi-row/object result set…
    …common with NoSQL engines
    …forces use of denormalized lists
    …must be maintained by developer/application code
    …Map Reduce processing helps with this (non-realtime)
    Relational sharding provides access to meaningful result sets of related data…
    …aggregation, sort easier to perform
    …logical search operations more natural
    …related rows are stored together
  • 29. Make your Application Shard-Safe
    For sharding to be efficient, every sharded table should include a shard key.
    SELECT WHERE clause statements should include a shard key where possible so that the query goes directly to a single shard rather than being executed against multiple shards in parallel.
    INSERT, UPDATE and DELETE WHERE CLAUSE statements must include a shard key so that the data is written to the correct shard.
    Avoid cross-shard inner joins. For example, joining one customer to another is not feasible as both customers may be in different shards. This can be accomplished, but requires complex logic and can adversely affect performance.
  • 30. What about High-Availability?
    Can you afford to take your databases offline:
    …for scheduled maintenance?
    …for unplanned failure?
    …can you accept some lost transactions?
    By definition Database Sharding adds failure points to the data tier
    A proven High-Availability strategy is a must…
    …system outages…especially in the cloud
    …planned maintenance…a necessity for all database engines
  • 31. Traditional asynch replication is inadequate…
    • Replication “Lag” can result in lost transactions
    • 32. No rapid failover for continuous operation
    • 33. Maintenance difficult in production environments
    • 34. Used by many NoSQL engines
  • Database Sharding and High-Availability
    • Need a minimum of 2 active-active instances per shard…
    • 35. …3 active-active instances for in-memory databases
    • 36. Support for…
    • 37. …fail-down
    • 38. …failover
    • 39. …maintenance
    • 40. …automated backup/restore
    • 41. …monitoring
  • Database Sharding…elastic shards
  • 42. Database Sharding…elastic shards
    Expand the number of shards…
    …divide a single shard into N new shards
    …or rebalance existing shards across N new shards
    Contract the number of shards…
    …consolidate N shards into a single shard
    Due to large data sizes, this takes time…
    …regardless of data architecture
    Requires High-Availability to ensure no downtime…
    …perform scaling on live replica in background
    …fail back to active instances
  • 43. Database Sharding…the future
    Ability to leverage proven database engines…
    …SQL
    …NoSQL
    …Caching
    Hybrid database infrastructures using the best of all technologies
    In-memory and disk-based solutions co-existing
    Allow developers to select the best database engine for a given set of application requirements…
    …seamless context-switching within the application
    …use the API of choice
    Improved management…
    …monitoring
    …configuration “on-the-fly”
    …dynamic elastic shards based on demand
  • 44. Database Sharding summary
    Database Sharding is the only effective tool for scaling your database tier in the cloud…
    …or anywhere
    Use Database Sharding for any type of engine…
    …SQL
    …NoSQL
    …Cache
    Use the best engine for a given application requirement…
    …avoid the “one-size-fits-all” trap
    …each engine has its strengths to capitalize on
    Ensure your High-Availability is proven and bulletproof…
    …especially critical on the cloud
    …must support failure and maintenance
  • 45. Questions/Answers
    Cory Isaacson
    CodeFutures Corporation
    cory.isaacson@codefutures.com
    http://www.dbshards.com