Database is a core part of almost every project. Each company uses own type and engine and maintains it.
When moving to cloud or developing cloud solutions it’s important to know what services can be provided by your cloud-provider.
Amazon Web Services provides different SQL and NoSQL solutions:
- RDS - Hosted MySQL, MS SQL, PostgreSQL etc
- Redshift - "petabyte-scale" DB
- DynamoDB - high performance document DB
- ElastiCache - hosted Redis and memcache
Right choice depends on many factors:
- amount of data
- data structure
- performance requirements
- price
- data-safety
- high-availability requirements
3. UserReport
Developing products that allow to learn the audience
Started using AWS more than 5 years ago
Fully migrated to AWS more than 1.5 years ago
Processing 3 billions requests monthly
Generating reports based on 8 billions of requests with batched reports
Online reports on 300 millions of records
Used ~50% of services provided by AWS
Totally happy regarding using AWS
7. Captain Obvious’s notes
● RDS doesn’t host particular DB but it hosts RDMS
● Create your root user, create separate users for each
database/application
● Your instance is firewalled with security groups
● Advanced configuration is available through parameter groups
8. Multi A-Z deployments for production workloads
● SLA 99.95% monthly uptime
● Doubles prices
● Allows to maintain your database without downtime
○ Minor updates
○ Major updates
○ Disk resize
○ EC2 upgrade
● No support for MS SQL Web, Express, Standard
9. Pricing
RDS price = EC2 + ELB + license
On-Demand or Reserved purchases with up-front payment
10. Backups
● Automated with automated rotation
● Restore to point of time
● Restore will create new instance and deploy desired version. It takes a
while
● Manual backup via Snapshots
11. Advanced optimizations
● Read replicas
○ you can create on the fly high available read-only copies of your data
● Using ElastiCache for performance boost
○ Using memcache will massively boost your queries
12. Downsides
● No control over EC2 for very advanced optimizations
● Backup works over instance
○ One RDS per DB
○ Or custom backups
● No Active Directory integration
● No Cross-region replication
14. Aurora
Available and Durable
Amazon Aurora is designed to offer greater than 99.99% availability,
replicating 6 copies of data across 3 Availability Zones and backing up
data continuously to Amazon S3. Recovery from physical storage failures
is transparent and instance restarts typically require less than a minute.
15. Aurora
Highly Scalable
You can use Amazon RDS to scale your Amazon Aurora database
instance up to 32 vCPUs and 244GiB Memory. You can also add up to 15
Amazon Aurora Replicas across three availability zones to further scale
read capacity. Amazon Aurora automatically grows storage as needed,
from 10GB up to 64TB.
21. DynamoDB performance
● You provision read and write capacity
● DynamoDB is divided into shards. Each shard has following limits:
○ 2 Gb of data
○ 3000 Read Capacity Units
○ 2000 Write Capacity Units
● Your requests can be throttled (API cares about retry-logic in most cases)
● You can setup autoscale of DynamoDB
22. DynamoDB Streams
● Triggers on data changes
● Cross-region replication
● ElasticSearch integration to allow to search among your data
https://aws.amazon.com/blogs/aws/new-logstash-plugin-search-
dynamodb-content-using-elasticsearch/
23. Backups and maintenance
● All data is replicated on three nodes - no backup required
● Change of provisioned throughput does not downgrade performance
● You can setup AutoScale for DynamoDB
https://github.com/sebdah/dynamic-dynamodb
24. *hit happens
DynamoDB had massive outage (high error rate on API request) in N. Verginia
that affected:
● SQS
● CloudWatch
● AutoScale Groups
● SNS
https://aws.amazon.com/message/5467D2/
27. Redis
● Extremely fast in-memory database
● Different data structures
○ Sets
○ Lists
○ Ordered sets
○ HyperLogLog
○ HashSets
○ Geo data
28. Redis hosted in AWS
● Different versions supported
● Multi AZ master/slave configuration maintained by Amazon
● Automated backups
● Monitoring with CloudWatch
● No chance to patch Redis for your needs (geeks like custom operations)
29. Example 1. Calculating unique visitors
PFADD visitors.20151001 xxx
PFCOUNT visitors.20151001
INC pageviews.20151001
GET pageviews.20151001
30. Example 2. Working with sets
# users 1 and 2 add item to basket
SADD added_item_to_cart id1
SADD added_item_to_cart id2
SADD begin_checkout id1
# users haven’t began checkout
SDIFFSTORE no_checkout added_item_to_cart begin_checkout
# users with email and haven’t started checkout
SINTER known_email no_checkout
34. Redshift
● Multiple-node cluster deployment that scales up to petabytes
● $1000/Tb/year
● Good for data mining
● Query execution minutes or hours
35. Table design
● HashKey - how data will be distributed across nodes
● SortKey - how data will be sorted within node
● Primary key, foreign keys, constraints - they are hints to query optimizer
36. Uploading data
● From CSV
● From DynamoDB
● From EMR
● Bulk insert
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_example
s.html
37. Loading data from S3
copy table
from 's3://mybucket/data/table.txt'
credentials 'aws_access_key_id=<access-key-
id>;aws_secret_access_key=<secret-access-key>'
csv [gzip] [delimiter "|"];
38. Query Execution
● PostgreSQL compatible syntax with many disabled features
● No views
● No stored procedures
● Recently deployed scalar custom functions
● 10 parallel queries
39. Getting query results
unload ('select * from mytable)
to 's3://mybucket/unload/result/'
credentials
'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-
key>';