Open the discussion: when building a system, we generally focus on application code and maintenance, but databases deserve a special focus as well.Few developers understand databases well, few teams have a dedicated DBA.They are here since a lot of time and projects teams focus more on app/web layer, hiding the DB behind persistence frameworks. (devs isolated from the DB).From the app point of view, the DB is a black bow that answers to SQL queries.Still, maintaining a DB present several challenges, and we're going to see how typical DB-admin scenarios can be handled in AWS.data is central to your application.data should be available, fault tolerant, scalable.AWS is the perfect platform for this, it's there when you need it, it can grow to suit your needs, and allows you to be fault tolerant.
Let's get started: here is a classic 3-tier application that all of us know, with web-servers, app-servers, and databases…If you want to install your relational DB with EC2, you start an instance, attach an EBS volume and install your DB service .Matt already showed you how to do this ..simple: start an instance, attach an EBS volume, install your favorite DB service, you're good to go
Sometimes, you don't even have to install the DB server yourself, there's a lot of pre-packaged AMIs for most of the DBs
it's all about "removing the muck" -- Matt Wood ;)Compare
or, you can also create a database server with RDS.RDS is a managed MySQL server
The first problem you need to solve when running a production database is …
EC2+EBS+RDBMS = Do It YourselfQuickly explain how snapshots work.Stored in S3 for high durability.Incremental snapshots.To snapshot a DB, don't forget to freeze the filesystem ..Can be automated with API/CLI
RDS special featureTalk about the backup window: 30 minutes (but backup time lower than 30 minutes)Can be disabled.
RDS takes care automatically of freezing the DB filesystem
RDS keeps snapshots until you delete them
Same mechanism as automated backups, snapshots are triggered by the user/DBARDS keeps snapshots until you delete them
Can be automated using API / CLIYou have to maintain your snapshots and schedule backups manually
RDS keeps snapshots until you delete them
Test and prod environments can be twins !!It's easy to create test environmentsDuring dev, create a test environment based on real dataNoticed how easy it is to create a server from snapshots / backups ? You can use this to create test/dev servers too !Differences between development and production environments can cause failure when promoting one to the other. Something as little as different instance sizes, for instance, can cause issue
Because the DB can be the bottleneck of your application, you need to keep an eye on it.You also want to avoid outages due to rookie mistakes (disk full) and be the first informed when your DB performance is degraded .
CloudWatch !but you're free to use Nagios/Cacti/…
RDS monitoring gives you more metricsYou can create an alarm that sends an email when a specific value cross goes beyond or below a threshold value
Btw, if you need to access your DB from outside AWS datacentre, you can connect to your RDS instance using SSL …Public key can be downloaded from AWS
Create a SG for databases that can only be accessed by app servers (=source)Security groups act as a firewall around a group of instances and allows you to control access to your [DB] Instances.
You can authorize specific IP address ranges or specific EC2 security groups
use a bastion hostCan be launched by CLIYou open a maintenance door to your realm only when needed
Bastion host can be terminated or stopped
Only enable certain users to be able to delete DB instances, etc. + show MFACreate User Identities - Add Users (unique identities that can interact with AWS services) to your AWS account. A User can be an individual, a system, or an application with a need to access AWS services.Assign and Manage Security Credentials - Assign security credentials such as access keys to each User, with the ability to rotate or revoke these credentials as needed.Organize Users in Groups - Create IAM Groups to simplify the management of permissions for multiple Users.Centrally Control User Access - Control the operations that each User can perform, including access to APIs for specific AWS Services and resources.Add Conditions to Permissions - Use conditions such as time of day, source IP address, or protocol (e.g. SSL) to control how and when a User can access AWS.View a Single AWS Bill - Receive a single bill which represents the activity of all of the Users within a single AWS account.The bottom line is that IAM allows you to control user acces to AWS API
IAM also allows you to give console access to some of your users (developers, etc.)
IAM also allows you to give console access to some of your users (developers, etc.)
What happens if someone steals the DBA password?There are different mitigation techniques like key rotation, etc. but the best one is …
Signing up is easy, you purchase a device from gemalto ($13) and you register in AWS
Reliability of my databaseMulti-AZ
Talk about AZs + sync replication+ failover + maintenance, backup, etcWhen you select this option, Amazon automatically provisions and maintains a synchronous standby replica in a different Availability Zone.The primary DB Instance is synchronously replicated across Availability Zones to the standby replica to provide data redundancy.Note that you can't use the standby for read/write.
In the event of a planned or unplanned outage of your primary DB Instance, Amazon RDS automatically switches to the standby replica. The automatic failover mechanism simply changes the canonical name record (CNAME) of the main DB Instance to point to the standby DB Instance
Keep in mind that multi-AZ deployments are not a scaling solution for reads and do not allow you to use the standby replica to serve read traffic.
Snapshot your existing volumeCreate a new volume from the snapshotShutdown your databaseUnmount the old volumeMount new volumeRestart your databaseCan be scripted but drawback is downtime
You can change the type size of a DB instance in a matter of minutes, so avoid spending too much time on figuring out what is the best size for you … do some real tests !Without downtime
Allocated storage in Gb
At some point, a growing application just gets bigger than the architecture can support."There is no silver bullet" -> devs will understand this !
Start with the easy solution: on the "real world", you by faster hardware. Here, you use larger instance types
It's easy to start bigger instances: stop instance, start with largerMount ???
You can change the type size of a DB instance in a matter of minutes, so avoid spending too much time on figuring out what is the best size for you … do some real tests !
Lots of reads ?
A master-slave replicated cluster is a set of multiple databases that sync data in a single direction. The master database is the custodian of all data, and is the one you write to: inserts, deletes, and updates. The slave database replicates data from the master, and holds a copy of it. This is the one you read from: select statements. This separation frees up resources on the master, which is often cpu bound, and allows you to make joins again without killing overall performance, since the slave handles the operation, not the master.RDS-RR uses the MySQL async replication system.+ transparent app integration (proxy, JDBC drivers, …)+ different sizes for RR + "autoscale" RRBut, if replication lag too big, your app must be aware of this –avoid read-after-write
Some people fear RRs because they think of clusters and operational cost and management complexity…But in AWS, easy to setup (RDS – RR)
When you create a RR, RDS takes a snapshot of the source DB and begins replication. As a result, you will experience a brief I/O suspension on your source DB Instance as the snapshot occursI/O suspension is mitigated if the source DB Instance is a Multi-AZ deployment, because snapshots are taken from the Multi-AZ standby.
Noticed in the previous popup that you can pick the AZ for a new RR ?Because RR can be deployed un other AZs, you can put RRs where another website deployment exists.Follow high availability recommendations from Matt's presentationWhy just 5 RRs per DB instance? There's a point where you see diminishing results:Apps that are mostly reading data tend to scale out better (writes doesn't scale out at all, you're duplicating write queries on a bunch of machines: writes must be executed on every machine for the replication to work)How busy is the master with writes? If the master is busy with writes, slaves are too! -> as the master reaches its limits, the effectiveness of scaling out with replication drops
Measure replication lag with CloudWatch.Smart DB load-balancers can use this metric to avoid using RRs that are left behind.
Use RR to run BI applications.BI or reporting apps only read databasesThey perform heavy queries (joins, stats, etc.) that impact the production database (-> degrade performance)BI apps can run on temporary read replicas(remember the API /CLI! This can be scripted)
Lots of reads ?
From the source: memcache is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.Remember: caching makes sense if it takes you more time to get data out of the database than it does getting it from the cacheLike all caches, if your application data-access pattern is completely random, caching will not help you; it will make things worse: lots of cache-miss.Still, EC2 makes it easy to adjust the size of your memcached fleet.Also: if you want to write data that you will need back later, memcache will not help youStrategy: Cache everything that is slow to query, fetch, or calculate. Memcached is generic, not only for DBs: you can cache complete portions of web pages that are expensive to compute and doesn't change too often (tag clouds, etc.)Some people might argue that you can use caches for write-heavy scenarios by writing directly to the cache, but it's a nightmare to manage and if a cache fails before you commit to the database, your data is lost. Whereas in read scenarios, if you loose a cache, you recreate the data from the database.Problem in scaling a fleet of memcached is the maintenance of indexes and cache pre-warming
Lots of writes ?As an application gets bigger, or if read/write ratio is low, write workload eventually gets too intensive for some part of the system to handle.What you see is that RRs can't keep up with the master anymore. (in sync-rep, master is subjected to extra-load + wait for slaves to execute changes it sends to them -> master shows strain first)
>> separate parts in your database schema (e.g. in a blog, separate articles from comments)
You can implement scaling techniques for each partition, ex: use different instance sizes for each partition depending on load (use CW to measure this)Works well, combined with RR.Downside is complexity: your app must be aware of the DB it uses, more backups to track (slightly more difficult to restore a whole system)Problem is that one partition can grow larger than what a single master + RRs can handle, and you're back to the starting point
Sharding = break a single logical dataset and distribute it across several servers.All database servers share the same schema, but they have different datasets.Advantages:Write scaling. To be efficient, write workload should be divided into completely separated sets of serversAbility to add more capacity as the dataset growsProperly designed sharded architectures can scale linearly with added servers. But for this, you need to achieve complete isolation between servers (so you don't add overhead when you add servers)Some applications fit well with this paradigm (if yours does, you're lucky !!). Typical examples are multitenant apps where each user data is completely independent (ex: Salesforce). User Id + mod (or hash) can be used as the hash key to find which server it maps to.Unfortunately, a lot of apps don't have a single clear sharding key (things get worse with social apps) -> data duplication and denormalization is often requiredDownside: A lot of queries become hard or impossible to perform in sharded environments. Data is processed separetely, then aggregated in the application code.
Sharding = break a single logical dataset and distribute it across several servers.All database servers share the same schema, but they have different datasets.Advantages:Write scaling. To be efficient, write workload should be divided into completely separated sets of serversAbility to add more capacity as the dataset growsProperly designed sharded architectures can scale linearly with added servers. But for this, you need to achieve complete isolation between servers (so you don't add overhead when you add servers)Some applications fit well with this paradigm (if yours does, you're lucky !!). Typical examples are multitenant apps where each user data is completely independent. User Id + mod (or hash) can be used as the hash key to find which server it maps to.Unfortunately, a lot of apps don't have a single clear sharding key (things get worse with social apps) -> data duplication and denormalization is often requiredDownside: A lot of queries become hard or impossible to perform in sharded environments. Data is processed separetely, then aggregated in the application code.
Offload static content data , big BLOBs to S3
Increasingly popular. No relational schema, eventual consistency, designed for availability (R&W performance), no transaction locking. Designed for scale, fault tolerance built-in.Key/value stores, significant performance gains.Other have a document approach, excellent for unstructured data.Social games use it a lot.Lots of choices available !
Many applications do not require the overhead introduced by a full-fledged RDBMSDon’t require complex transactions or joinsSimply want to store data items (set it and forget it)Want to be free from scaling , availability, and data model constraints of relational databaseFor users who:Principally utilize index and query functions rather than more complex relational database functions Don’t want any administrative burden at all in managing their structured data Want a service that scales automatically up or down in response to demand, without user intervention Require the highest availability and can’t tolerate downtime for data backup or software maintenance
LimitsFollowing is a table that describes current limits within Amazon SimpleDB.Parameter Restriction Domain size 10 GB per domain Domain size 1 billion attributes per domain Domain name 3-255 characters (a-z, A-Z, 0-9, '_', '-', and '.') Domains per account 250 Attribute name-value pairs per item 256 Attribute name length 1024 bytes Attribute value length 1024 bytes Item name length 1024 bytes Attribute name, attribute value, and item name allowed characters All UTF-8 characters that are valid in XML documents.Control characters and any sequences that are not valid in XML are returned Base64-encoded. For more information, see Working with XML-Restricted Characters.Attributes per PutAttributes operation 256 Attributes requested per Select operation 256 Items per BatchPutAttributesoperation 25 Maximum items in Selectresponse 2500 Maximum query execution time 5 seconds Maximum number of unique attributes per Selectexpression 20Maximum number of comparisons per Selectexpression 20Maximum response size for Select 1MB
Use cases:RDS:gumi, one of the largest social gaming companies in Japan, relies on Amazon RDS to enable 10 million unique users play daily on its gaming platform built on AWSAmazon.com’s Customer Experience Analytics Team uses Amazon RDS to store and query customer simulation data.SimpleDB:NetflixAlexa stores over 12 million objects in Amazon SimpleDB, and performs over 5 million queries against it daily.
AWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWS
AWS Identity and Access Management(IAM)<br />Create User Identities<br />Assign and Manage Security Credentials<br />Organize Users in Groups<br />Centrally Control User Access<br />Add Conditions to Permissions<br />