AWS Activate webinar - Scalable databases for fast growing startups
 

AWS Activate webinar - Scalable databases for fast growing startups

on

  • 837 views

Fast growing startups building high scale applications demand a lot from their infrastructure and in particular from their databases. Often, databases become the bottleneck of the startups’ ...

Fast growing startups building high scale applications demand a lot from their infrastructure and in particular from their databases. Often, databases become the bottleneck of the startups’ technology stack, with the risk of inhibiting fast growth as they are not easy to set up, operate and scale in the cloud. This webinar focuses on how to build scalable databases in the Cloud and covers how to effectively combine the use of relational, NoSQL, and even data warehouse databases, which have become a reality for startups with the launch of Amazon Redshift.
Key takeaways:

Understand the trade-off between SQL and NoSQL and when to go for a hybrid model.
Best practices in setting up your database in the AWS cloud whether using managed services or managing it yourself.
Learn how to minimize the costs of your database with the right architecture and pricing models.
Who should attend:

DBA’s
Startup CTO’s
Developers
Engineers
Architects
Growth Hackers

Statistics

Views

Total Views
837
Views on SlideShare
829
Embed Views
8

Actions

Likes
1
Downloads
22
Comments
0

1 Embed 8

http://www.slideee.com 8

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

AWS Activate webinar - Scalable databases for fast growing startups AWS Activate webinar - Scalable databases for fast growing startups Presentation Transcript

  • Scalable Databases for Fast Growing Startups Blair Layton Business Development Manager – Database Services Amazon Web Services - APAC
  • Agenda • Self-Managed or Managed Database Services? • NoSQL or Relational? • Performance Tips and Tricks • How to scale from 1 to 10,000,000 users? • How do I save money? • Summary • Q&A
  • SelfSelfSelfSelf----Managed orManaged orManaged orManaged or Managed Database Services?Managed Database Services?Managed Database Services?Managed Database Services?
  • backup & recovery, data load & unload performance tuning 25%25%25%25%40%40%40%40% 5%5%5%5% 5%5%5%5% scripting & coding security planning install, upgrade, patch and migrate documentation, licensing & training Why Managed Databases?
  • If You Host Your Databases On-premises Power, HVAC, net Rack & stack Server maintenance OS patches DB s/w patches Database backups Scaling High availability DB s/w installs OS installation you App optimization
  • Power, HVAC, net Rack & stack Server maintenance OS patches DB s/w patches Database backups Scaling High availability DB s/w installs OS installation you App optimization If You Host Your Databases On-premises
  • If You Host Your Databases in EC2 Power, HVAC, net Rack & stack Server maintenance OS patches DB s/w patches Database backups Scaling High availability DB s/w installs OS installation you App optimization
  • OS patches DB s/w patches Database backups Scaling High availability DB s/w installs you App optimization Power, HVAC, net Rack & stack Server maintenance OS installation If You Host Your Databases in EC2
  • If You Choose a Managed Database Service Power, HVAC, net Rack & stack Server maintenance OS patches DB s/w patches Database backups App optimization High availability DB s/w installs OS installation you Scaling
  • differentiated effort increases the uniqueness of an application
  • Amazon RDS Amazon DynamoDB Amazon Redshift Amazon ElastiCache Compute Storage AWS Global Infrastructure Database Application Services Deployment & Administration Networking AWS Database Services Scalable High Performance Application Storage in the Cloud
  • Relational Databases Fully managed; zero admin MySQL, Oracle, Postgres, SQL Server Trillions of I/O requests/month Amazon RDS
  • Flipboard relies on Amazon RDS • Flipboard is an online magazine with millions of users and billions of “flips” per month • Uses Amazon RDS and its Multi-AZ capabilities to store mission critical user data "We were able to go from concept to delivered product in about six months with just a handful of engineers." - Greg Scallan, Chief Architect, Flipboard
  • • Manageability Rapid deployment with pre-configured parameters Patch Management Monitoring and Metrics • Availability and Data Durability Automated Backups and Point-In-Time-Recovery DB Snapshots Automatic Host Replacement (Single-AZ) Multi-AZ deployments • Scalability Push-Button Scaling • Storage, Memory and Compute Read Replicas Key Features
  • RDS for Production Workloads AmazonAmazonAmazonAmazon RDSRDSRDSRDS ConfigurationConfigurationConfigurationConfiguration ImproveImproveImproveImprove AvailabilityAvailabilityAvailabilityAvailability IncreaseIncreaseIncreaseIncrease ThroughputThroughputThroughputThroughput ReduceReduceReduceReduce LatencyLatencyLatencyLatency PushPushPushPush----Button ScalingButton ScalingButton ScalingButton Scaling MultiMultiMultiMulti AZAZAZAZ ReadReadReadRead ReplicasReplicasReplicasReplicas Provisioned IOPSProvisioned IOPSProvisioned IOPSProvisioned IOPS Read ReplicasPush-Button Scaling Provisioned IOPS Region Multi-AZ availability zone availability zone
  • In-Memory Cache Elastic and reliable Memcached or Redis Fully managed; zero admin Amazon ElastiCache
  • ElastiCache: Fully Managed Cache Service Easy to Deploy Deploy master- slave(s) configuration with a few button clicks or API calls Easy to Migrate Compatible with memcached or Redis Existing code will work when you update node end points Easy to Administer ElastiCache automatically replaces failed nodes and patches software as needed CloudWatch enables you to monitor cache performance metrics Easy to Secure Supports VPC and Security Group configurations Easy to Scale Provide assisted scale up and scale out capability
  • Application Server Hot Items Small, frequently-accessed items are ideal candidates for read caching • Reduce server-side latency to <1ms • Eliminate “hot spot” performance barriers • Offload heavy read activity from database
  • NoSQL Database Durable low latency Fully managed; zero admin Massive and seamless scalability Amazon DynamoDB
  • WRITES Continuously replicated to 3 AZ’s Quorum acknowledgment Persisted to disk (custom SSD) READS Strongly or eventually consistent No trade-off in latency Durable Low Latency – At Scale
  • Petabyte scale Massively parallel Relational data warehouse Fully managed; zero admin Amazon Redshift a lot faster a lot cheaper a whole lot simpler
  • • Load • Query • Resize • Backup • Restore Parallelize and Distribute Everything Compute Node 16TB 10 GigE (HPC) Ingestion Backup Restore SQL Clients / BI Tools Amazon S3 Client VPC Compute Node 16TB Compute Node 16TB Leader Node
  • Databases on EC2 • Any database that runs on Windows or Linux! • Why? • No managed service exists from AWS, e.g. MongoDB • Full control • Exceed limits of managed service, e.g. > 3TB of storage on RDS
  • NoSQL or Relational?
  • Not available on AWS Spectrum of Database Options SQL NoSQL Low Cost High Cost Do-it Yourself Fully Managed
  • Spectrum of Database Options SQL NoSQL Do-it Yourself Fully Managed
  • MySQL, Oracle, SQL Server, PostgreSQL Amazon Redshift Spectrum of Database Options SQL NoSQL Do-it Yourself Fully Managed MySQL, Oracle, SQL Server, PostgreSQL, MariaDB, Vertica, ParAccel…
  • Spectrum of Database Options SQL NoSQL Do-it Yourself Fully Managed MongoDB Cassandra Redis Memcache DynamoDB ElastiCache (Memcache) ElastiCache (Redis) SimpleDB
  • Thinking About the Questions Should I use SQL or NoSQL? Should I use MySQL or PostgreSQL? Should I use Redis, Memcache, or ElastiCache? ?Should I use MongoDB, Cassandra, or DynamoDB?
  • Actually, Thinking About the Right Questions What are my scale and latency needs? What are my transactional and consistency needs? What are my read/write, storage and IOPS needs? What are my time to market and server control needs? ?
  • Factors to Consider Factors SQL NoSQL Application • App with complex business logic? • Web app with lots of users? Transactions • Complex transactions, joins, updates? • Simple data model, updates, queries? Scale • Developer managed • Automatic, on-demand scaling Performance • Developer architected • Consistent, high performance at scale Availability • Architected for fail-over • Seamless and transparent Core Skills • SQL + Java/Ruby/Python/PhP • NoSQL + Java/Ruby/Python/PhP Best of both worlds: Possible to Use SQL and NoSQL models in one AppBest of both worlds: Possible to Use SQL and NoSQL models in one App
  • Performance Tips and Tricks
  • Performance Tips and Tricks • Understand your workload – Read:Write ratio, I/O requirements, CPU requirements • Identify bottlenecks – CPU, Memory, Disk I/O, Network latency/bandwidth – Use Cloudwatch and OS metrics • Choose the right instance type – High CPU, High Memory, High Storage, etc. • Understand EBS!
  • EBS =
  • Amazon EBS Magnetic Amazon Elastic Block Storage (EBS) • IOPS: ~100 IOPS steady-state, with best-effort bursts to hundreds. 40-200 IOPS in terms of variability. • Throughput: variable by workload, best effort to 10s of MB/s. • Latency: Varies, reads typically <20 ms, writes typically <10 ms. • Capacity: As provisioned, up to 1 TB.
  • Amazon EBS General Purpose • IOPS: 3 IOPS per GB consistent, with bursts to 3,000 IOPS. Bucket principle, fills up when not used and empties as used. • Throughput: variable by workload, best effort to 64 MB/s. • Latency: Low and consistent. • Capacity: As provisioned, up to 1 TB. Amazon Elastic Block Storage (EBS)
  • Amazon EBS Provisioned IOPS • IOPS: Within 10% of up to 4000 IOPS, 99.9% of a given year, as provisioned. • Throughput: 16 KB per I/O = up to 64 MB/s, as provisioned. • Latency: low and consistent, at recommended QD • Capacity: As provisioned, up to 1 TB * * Amazon Elastic Block Storage (EBS)
  • EC2 Why the ? * An I/O EBS Just because Amazon EC2 sends more work doesn’t mean there’s enough bandwidth to handle it!
  • EC2 Why the ? * An I/O Without more bandwidth, more EBS volumes or higher PIOPS won’t help!
  • EBS-Optimized Oh, YEAH!! * EC2 A “boatload” of I/O * EBS w/ PIOPS
  • Architecting for Performance • IOPS consistency requires EBS- optimized instances • Maximum throughput delivered by Amazon EBS is limited by Amazon EC2 bandwidth • EBS throughput = EBS IOPS × Block size – Ex: 64 MB/s = 4000 IOPS × 16 KB Max 8k = 2x Max 4k = 4x* Max 2k = 8x* *Maximum IOPS is also limited to ~100,000 per 32 vCpu, irrespective of block size/throughput.
  • Additional Hints • Mount partitions with “noatime” and “nodiratime” – Removes a write every time a read is done • Turn off file system read ahead if possible – Especially for OLTP systems • Use vendor storage solutions – Oracle ASM • Optimize kernel settings
  • Scaling from 1 to 10,000,000 Users
  • So how do we scale?
  • Hi, I have NO IDEA what I am doing!!
  • So let’s start from day one, user one ( you )
  • Day One, User One: • We could potentially get to a few hundred to a few thousand depending on application complexity and traffic • No failover • No redundancy • Too many eggs in one basket EC2 Instance Elastic IP Amazon Route 53 User
  • “We’re gonna need a bigger box” • Simplest approach • Can now leverage PIOPs • High I/O instances • High memory instances • High CPU instances • High storage instances • Easy to change instance sizes • Will hit an endpoint eventually r3.8xlarge m3.2xlarge t2.small
  • Day Two, User >1 First let’s separate out our single host into more than one. • Web • Database – Make use of a database service? Web Instance Database Instance Elastic IP Amazon Route 53 User
  • Start with the right databases for the job
  • User >100 First let’s separate out our single host into more than one • Web • Database – Use RDS to make your life easier Web Instance Elastic IP RDS DB Instance Amazon Route 53 User
  • User > 1000 Next let’s address our lack of failover and redundancy issues • Elastic Load Balancing • Another web instance – In another Availability Zone • Enable Amazon RDS multi-AZ Web Instance RDS DB Instance Active (Multi-AZ) Availability Zone Availability Zone Web Instance RDS DB Instance Standby (Multi-AZ) Elastic Load Balancing Amazon Route 53 User
  • User >10 ks–100 ks RDS DB Instance Active (Multi-AZ) Availability Zone Availability Zone RDS DB Instance Standby (Multi-AZ) Elastic Load Balancing RDS DB Instance Read Replica RDS DB Instance Read Replica RDS DB Instance Read Replica RDS DB Instance Read Replica Web Instance Web Instance Web Instance Web Instance Web Instance Web Instance Web Instance Web Instance Amazon Route 53 User
  • This will take us pretty far honestly, but we care about performance and efficiency, so let’s clean this up a bit
  • Shift Some Load Around Let’s lighten the load on our web and database instances • Move static content from the web instance to Amazon S3 and CloudFront • Move dynamic content from the Elastic Load Balancing to CloudFront • Move session/state and DB caching to ElastiCache or DynamoDB Web Instance RDS DB Instance Active (Multi-AZ) Availability Zone Elastic Load Balancing Amazon S3 Amazon CloudFront Amazon Route 53 User ElastiCache Amazon DynamoDB
  • User >500k+ Availability Zone Amazon Route 53 User Amazon S3 Amazon Cloudfront Availability Zone Elastic Load Balancing DynamoDB RDS DB Instance Read Replica Web Instance Web Instance Web Instance ElastiCache RDS DB Instance Read Replica Web Instance Web Instance Web Instance ElastiCacheRDS DB Instance Standby (Multi-AZ) RDS DB Instance Active (Multi-AZ)
  • From 500K to 1 Million Users • Getting serious now • Significant user base • Plenty of attention if things go wrong • Interesting phase for startups with funding rounds
  • Time to make some radical improvements at the web & app layers
  • SOAing Move services into their own tiers or modules. Treat each of these as 100% separate pieces of your infrastructure and scale them independently. Use queues! Amazon.com and AWS do this extensively! It offers flexibility and greater understanding of each component.
  • Users > 1 Million RDS DB Instance Active (Multi-AZ) Availability Zone Elastic Load Balancer RDS DB Instance Read Replica RDS DB Instance Read Replica Web Instance Web Instance Web Instance Web Instance Amazon Route 53 User Amazon S3 Amazon Cloudfront Amazon DynamoDB Amazon SQS ElastiCache Worker Instance Worker Instance Amazon CloudWatch Internal App Instance Internal App Instance Amazon SES
  • The next big steps
  • From 5 to 10 Million Users You may start to run into issues with your database around contention on the write master. How can you solve it? • Federation (splitting into multiple DBs based on function) • Sharding (splitting one data set up across multiple hosts) • Moving some functionality to other types of databases – NoSQL for hot tables, lookup tables, leaderboards/scoring, meta data – Data warehouse for analytics: user behavior, performance monitoring, a/b testing results, KPIs/dashboards.
  • How do I Save Money?
  • Saving $$$ • Use managed database services – Focus your limited resources on the application – Elasticache can reduce your database costs • Understand how to scale from the start – Save redesign work and unhappy customers – Start and stop instances as required • Use the AWS platform – Don’t reinvent the wheel, concentrate on your core competency – Using CloudFront will reduce your costs on EC2 dramatically • Purchase RIs and use spot instances • Constantly monitor and right-size your environment
  • Sorry, How do I Scale my Database?
  • Summary • Decide on self-managed or managed database services • Choose the right database for your use case and skillsets to start with • Use Multi-AZ for your infrastructure • Choose the right instance family and size for your workloads • Understand the 3 types of EBS (Magnetic, General Purpose and PIOPS) • Make use of self-scaling services (Elastic Load Balancing, Amazon S3, Amazon SNS, SQS, Amazon SES, etc.) • Build in redundancy at every level • Blend SQL & NoSQL wisely • Use a data warehouse to offload large analytical queries from your main database • Cache data both inside and outside your infrastructure • Purchase RIs and use Spot instances • Split tiers into individual services (SOA) • Use autoscaling once you are ready for it • Use automation tools in your infrastructure • Make sure you have good metrics, monitoring, and logging tools in place • Don’t reinvent the wheel
  • Q & A