• Like

How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

  • 2,485 views
Uploaded on

Is your database holding back your application? Find out how we at RightScale use SQL and NoSQL databases such as mySQL and Cassandra to provide a scalable, distributed, and highly available service …

Is your database holding back your application? Find out how we at RightScale use SQL and NoSQL databases such as mySQL and Cassandra to provide a scalable, distributed, and highly available service around the world, that is designed to recover from failures of a whole cloud region.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,485
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
23
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. How RightScale Architects its Databases (for World-wide Scale, HA and DR scenarios)Josep BlanquerSenior Systems Architect, RightScale
  • 2. 2# Menu Intro Data Taxonomy Data Storage Design Scale, HA and DR ConclusionTalk with the Experts.
  • 3. 3#Intro: Expectations and scope What this is and what is not • IS a talk about: • how RightScale has designed and implemented its backing datastores • …for a few of the most representative internal systems • …with the rationale behind it • Is NOT a talk about • RightScale’s overall architecture • Nodes or hosts, it’s about Systems • RightScale’s data modeling Note: Most of the design is implemented and in production but some of the most advanced things that are still in beta, or we’re still being worked onTalk with the Experts.
  • 4. 4#Intro: Tools and Technologies • RightScale uses a mix of RDBMS and NoSQL technologies: • MySQL , Cassandra and S3 (for backups and archiving) • Transactionality: • MySQL: strong ACID properties • Cassandra: no Atomicity, eventually Consistent, some Isolation, Durable • Availability: • MySQL: async replication. Master-SlaveN or Master-Master • Cassandra: Distributed, master-less, highly-replicated (multi-DC) • Sharding: • MySQL: no explicit inter-node tools. (Sharding done by application) • Cassandra: partitions data internally across nodes.Talk with the Experts.
  • 5. 5#Taxonomy of RightScale’s Data Representative systems with different data semantics: Global Objects  Marketplace Assets Dashboard Objects  Audits  Tags  Recent Events Cloud Polling Data Routing Data Monitoring/SyslogTalk with the Experts.
  • 6. 6#Taxonomy of RightScale’s Data Representative systems with different data semantics: Global Objects  Marketplace Assets Common across accounts:  Users Dashboard Objects  Plans  Audits  Settings  MultiCloud Marketplace:  Tags  Published Assets  Recent Events  Sharing Groups  … Cloud Polling Data Routing Data Monitoring/SyslogTalk with the Experts.
  • 7. 7#Taxonomy of RightScale’s Data Representative systems with different data semantics: Global Objects  Marketplace Assets Dashboard Objects  Audits Private to each account:  Tags  Deployments  Imported assets  Recent Events  Alert Specifications  Server Inputs Cloud Polling Data  Audit Routing Data  Tags  User Events Monitoring/Syslog  …Talk with the Experts.
  • 8. 8#Taxonomy of RightScale’s Data Representative systems with different data semantics: Global Objects  Marketplace Assets Dashboard Objects  Audits Private to each account:  Tags  Cloud resource states (cache)  Cloud credentials  Recent Events Cloud Polling Data Routing Data Monitoring/SyslogTalk with the Experts.
  • 9. 9#Taxonomy of RightScale’s Data Representative systems with different data semantics: Global Objects  Marketplace Assets Dashboard Objects  Audits Private to each account:  Tags  Instance agents location  Recent Events  Core agents location  Agent action registry Cloud Polling Data  … Routing Data Monitoring/SyslogTalk with the Experts.
  • 10. 10#Taxonomy of RightScale’s Data Representative systems with different data semantics: Global Objects  Marketplace Assets Dashboard Objects  Audits  Tags  Recent Events Private to each account: Cloud Polling Data  Collected metric data  Collected syslog data Routing Data  … Monitoring/SyslogTalk with the Experts.
  • 11. 11#Taxonomy of RightScale’s Data Who uses the data? Global Objects • Users through the Dash/API  Marketplace Assets • Instances from the Cloud Users Dashboard Objects  Audits Data close to the Users  Tags  Recent Events Data Placement Cloud Polling Data Instances Routing Data Data close to the Cloud Monitoring/SyslogTalk with the Experts.
  • 12. 12#Taxonomy of RightScale’s Data Which data do we need? X-acct Global Objects • Data for all accounts  Marketplace Assets • Data for a single account Dashboard Objects Data shared between accounts  Audits  Tags Data scope and containment Account  Recent Events Cloud Polling Data Data required within scope of a single account Routing Data Monitoring/SyslogTalk with the Experts.
  • 13. 13#Taxonomy of RightScale’s Data Who uses the data? Proximity to User vs. Cloud Which data do we need? Scope of data available X-acct Global Objects Close to user  Marketplace Assets Globally accessible data Users Dashboard Objects  Audits Close to user  Tags Account-shardable data Account  Recent Events Cloud Polling Data Instances Close to cloud resources Routing Data Account-shardable* data Monitoring/SyslogTalk with the Experts.
  • 14. 14# X-Account AccountUsersInstancesTalk with the Experts.
  • 15. 15# X-Account Why custom? More control • Multiple sourcesUsers global • Individual columns Custom replication • Apply transformations • Smart re-sync features Global: MySQL • ACID semantics • Master-Slave replicationInstancesTalk with the Experts.
  • 16. 16# X-Account Account Data archive: S3 S3 • Low read rate tags • Globally accessibleUsers global dash audit Other systems: Cassandra events • Simpler Key-Value access • Great scalability • Great replica control • High write availability • Time-to-live expiration as cache Dashboard: MySQL • Rows tagged by account • ACID semantics • Master-SlaveN replication • Slave readsInstances • Rows tagged by accountTalk with the Experts.
  • 17. 17# X-Account Account S3 tags tagsUsers global dash audit dash audit events events So we can horizontally scale our dashboard by partitioning objects based on account groups: ClustersInstancesTalk with the Experts.
  • 18. 18# Account S3 S3 S3 tags tags tags Cluster N Cluster 1 Cluster 3 … dash audit dash audit dash audit events events events Features:Users • 1 cluster: N accounts • 1 account: 1 home RightScale Accounts • Migratable accounts Benefits: • Great horizontal growth Account Set 2 Account Set 1 • Better failure isolation • Independent scale • Load rebalancing • Versionable code • Differentiated serviceTalk with the Experts.
  • 19. 19# X-Account Account S3 tags tagsUsers global dash audit dash audit events events gatewayInstances monitor routingTalk with the Experts.
  • 20. 20# X-Account Account S3 tags dash And partition our cloud objects based on the cloudUsers global audit the instances of an account run on: events Islands gateway gatewayInstances monitor monitor routing routingTalk with the Experts.
  • 21. 21# Account gateway gateway Gateway: MySQL gateway gateway • Master-Slave replication Monitoring: Custom • Replicated files Island N • Island 1 Island 2 Can port to NoSQL easily • Mostly a resource monitor cache monitor monitor • Backup to S3 monitor • But cloud partitionable • Archive to S3 routing routing routing routingInstances Features: • 1 instance: 1 home island • 1 Island can serve N clouds Routing: Cassandra • Core Agents: global data • Simpler Key-Value access • Very high availability Benefits: Services co-located • Services co-located Great scalability • Close Services co-located to cloud resources with resources • with resources Great replica control with resources • Plus cross DC replication* • Good failure isolation • As good as cloud  Cloud 1 Cloud 2 Cloud N • Good scale: global replicas across cassandra DCsTalk with the Experts.
  • 22. 22# Account S3 S3 S3 tags tags tags Cluster N Cluster 1 Cluster 3Users … dash audit dash audit dash audit events events events Different Geographies What if the cloud where the cluster is deployed on… gateway Fails? gateway gatewayInstances Island N Island 1 Island 2 monitor monitor monitor routing routing routing Different CloudsTalk with the Experts.
  • 23. 23# Account Sister Clusters S3 S3 S3 tags tags tags Cluster N Cluster 1 Cluster 3Users Full replica … dash audit dash audit dash audit events events events Features: • Each master has an extra remote slave gateway gateway • Each cluster in a pair is a DC replica of the other’s gatewayInstances localring Island N Island 1 Island 2 monitor At Disaster Recovery time: monitor monitor • Apps are told to start serving an extra shard routing •routing need to provision more infrastructure to recover No routing (try to avoid since everybody is on the same boat) • New resources can be allocated over time to help offload existing onesTalk with the Experts.
  • 24. 24#Conclusions • Shown that RightScale uses multiple database technologies: • RDBMS – MySQL for the ACID semantics and ‘queryability’ • Using a Master to N-Slaves for RO scale, and quick failure recovery • And ReadOnly Provisioning – To increase RO availability and scale remote systems • NoSQL: Cassandra for Availability and Scalability • for higher Read/Write availability within a cluster • For fully replicated regions across the globe (for Read/Write!) • Shown how RightScale uses them in different techniques • It partitions resource data into Islands based on cloud proximity • Can achieve in-cloud polling,and keep monitoring/syslog data storage next to instances • Can provide routing availability, colocated with instances for any world region • It partitions core data into Clusters based on account groups • To scale the core horizontally, and independently and achieve account isolation/differentiation • Enhances fault isolation: Assigning accounts to Clusters deployed away their cloud resources • It maintains cluster pairs (sister sites) • To recover from full cloud region failures • It doesn’t require massive amounts of new resources to recoverTalk with the Experts.
  • 25. Questions?Talk with the Experts.