Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DAT101 Understanding AWS Database Options - AWS re: Invent 2012


Published on

When you're handling big data in the modern world, you will come to a point where you can't just pick a “one size fits all” approach anymore. However, to get the results you want, you also don’t have to spend big money on fire breathing hardware, or expensive software. AWS offers a beautiful array of open and commercial database choices, from do-it-yourself to fully managed services which handle scaling, and gives you powerful tools to choose the right architecture. You could choose from MySQL, RDS, Oracle, SQL Server, MongoDB, DynamoDB, Cassandra, ElastiCache, Redis, and SimpleDB, and our customers use them for different use cases. Each has different strengths, and this session highlights when you would want to choose each, with examples of how we use each to solve our big data challenges and why we made those decisions. We profile the some of the choices available to you - MySQL, RDS, Elasticache, Redis, Cassandra, MongoDB and DynamoDB – and three customer case studies on RDS, Elasticache and DynamoDB.

  • Be the first to comment

DAT101 Understanding AWS Database Options - AWS re: Invent 2012

  1. 1. AWS Database Options and Decision FactorsBest Practice Tips and Techniques • Optimizing for Manageability and Scale  Edmodo • Optimizing for App Velocity and Scale  Obama for America • Leveraging YesSQL and NoSQL  BrandVerityQ&A
  2. 2. Before We Begin
  3. 3. Easily and rapidly analyzepetabytes of data1/10 the cost of traditionaldata warehousesAutomated deployment &administrationCompatible with popularBI tools
  4. 4. Common BI Tools Choose from 16TB local disk / 128 GB JDBC/ODBC RAM or 2TB local disk / 16GB RAM nodes Leader Node Configure up to 100 nodes for up to 1.6 Pb 10GigE MeshAmazon Redshift Data stored in columnar format for 10X Compute Compute Compute I/O efficiencies and fast queries Node Node Node Query with standard SQL and JDBC/ODBC
  5. 5. YourAmazon Redshift BI Tools ODBC / JDBC PostgreSQL drivers
  6. 6. 1. Zero to App in ____ Minutes2. Zero to Millions of users in ____ Days3. Zero to “IPO” in ____ Months
  7. 7. 1. Zero to App in ____ Minutes2. Zero to Millions of users in ____ Days3. Zero to “IPO” in ____ Months
  8. 8. Focus on your App
  9. 9. Load balancerApplication tierDatabase tier
  10. 10. Load balancer Security, Scale, Availability… Application tier Security, Innovation, Scale, Performance, Availability… Database tierSecurity, Innovation, Scale, Transactions, Performance, Durability, Availability, Skills..
  11. 11. SQL NoSQLDo-it Yourself Fully Managed Not available on AWSLow Cost High Cost
  12. 12. SQL NoSQLDo-it Yourself Fully Managed
  13. 13. SQL NoSQLDo-it Yourself Fully ManagedMySQL MySQLOracle OracleSQL Server SQL ServerMariaDBPostgres…
  14. 14. SQL NoSQL Do-it Yourself Fully ManagedMongoDB DynamoDBCassandra ElastiCacheRedis SimpleDBMemcache
  15. 15. Should I useShould I use SQL MySQL on EC2 or or NoSQL? RDS? Should I use MongoDB, ? Should I use Redis, Cassandra, or Memcache, or DynamoDB? ElastiCache?
  16. 16. What are myWhat are my scale transactional andand latency needs? consistency needs? What are my ? What are my time toread/write, storage market and server and IOPS needs? control needs?
  17. 17. Factors SQL NoSQLApplication • App with complex business logic? • Web app with lots of users?Transactions • Complex txns, joins, updates? • Simple data model, updates, queries?Scale • Developer managed • Automatic, on-demand scalingPerformance • Developer architected • Consistent, high performance at scaleAvailability • Architected for fail-over • Seamless and transparentCore Skills • SQL + Java/Ruby/Python/PhP • NoSQL + Java/Ruby/Python/PhP Best of both worlds: Possible to Use SQL and NoSQL models in one App
  18. 18. Factors Do it Yourself (DIY) Fully ManagedReplication • Granular, app managed • Transparent and configuredMonitoring • Specific agents and custom • Automated and API drivenSecurity • Root access, custom configs • Hardened by the serviceResources • Requires more DBA resources and time • Requires less DBA resources and timeTime to market • Sophistication vs. speed • Rapid iterationCore Skills • Systems, databases, monitoring • Applications, User focused Best of both worlds: Possible to manage different tiers differently
  19. 19. Amazon RDS is a fully managed SQL database service. Choice of Database engines Simple to deploy and scale Reliable and cost effective Without any operational burden.
  20. 20. Migration Backup and recoverySchema design PatchingQuery construction ConfigurationQuery optimization Software upgrades Storage upgrades Frequent server upgrades Focus on the “innovation” Hardware crash Off load the “administration”
  21. 21.  Multiple databases per instance Standard user accounts Connect and query using common MySQL tools & drivers Tune engine parameters Import and export data using standard MySQL tools (mysqldump) Diagnostics Native MySQL replication SSL for encryption over the wire Monitor metrics Shell, super user or direct file system access (Think security!)
  22. 22. ElastiCache is a fully managed Memcachecaching service.Easy to set up and operateScale cache clusters with push button easeUltra fast response time for read scalingWithout any operational burden.
  23. 23. Amazon DynamoDB is a fully managed NoSQLdatabase service.Store and retrieve any amount of dataScale throughput to millions of IOSingle digit millisecond latenciesWithout any operational burden.
  24. 24. CreateTable PutItem UpdateTable GetItem DeleteTable UpdateItem “Select”, “insert”, “update” DescribeTable itemsManage tables DeleteItem ListTables BatchGetItem Query Bulk select or update Query specific items OR Scan BatchWriteItem (max 1MB) scan the full table
  25. 25. So, what are the tips and techniques forsuccessful deployments?
  26. 26. Educates millions of students Amazon EC2 Amazon DynamoDB AmazonReaches millions of citizens Elasticache Amazon RDS AmazonAnalyzes billions of Ads S3
  27. 27. KimoEducates millions of students RosenbaumReaches millions of citizensAnalyzes billions of Ads
  28. 28. Kimo Rosenbaum – Data Architect, Edmodo
  29. 29. Where learning happens. Kimo Rosenbaum AWS re: Invent 2012
  30. 30. Learning 101• Largest, fastest growing social platform for education• Secure learning network for teachers and students• Browser, iOS, Android• Free for teachers and students
  31. 31. Stats 101• 100,000 schools• 14 million users• 7 million new users in the last year• 1 million visits daily
  32. 32. Web Instance Auto scaling Group Amazon CloudWatchAmazon Route 53 Elastic Load Balancer Cache Cache Instance InstanceAmazon Cloudfront Instances Amazon S3 RDS DB Instance RDS DB Instance RDS DB Instance Read Replica Read Replica Read Replica Availability Zone RDS DB Instance RDS DB Instance RDS DB Instance Read Replica MySQL DB Instance Read Replica MySQL DB Instance Read Replica MySQL DB Instance
  33. 33. DBA 101• Restore from snapshot• Replica creation• Parameter tuning• Metrics collection• Know your app/data
  34. 34. Educates millions of students JayReaches millions of citizens EdwardsAnalyzes billions of Ads
  35. 35. Jay Edwards – Database Engineer, Obama Campaign
  36. 36. Me.• Twitter: First dedicated DBA• OFA: Lead Database Engineer• PalominoDB: CTO & VP/Operations
  37. 37. Obama for America.• Technically sophisticated for a campaign • Not “web-scale”• Hockey-stick++ growth• Downtime hurts. A lot…really, really, really a lot.
  38. 38. Hockey-stick++
  39. 39. OFA Architecture RDS Read Replica ElastiCache RDS with DynamoDB Multi-AZ ELB
  40. 40. Problems!• You always need more databases • OFA had 24+ schemas & 100+ RDS instances• You never have enough DBAs • OFA had 1 – 2 x 0.5 fulltime MySQL DBAs
  41. 41. Why RDS?• Makes operational issues very easy • Need more replicas? BAM! • Upsize hardware? KAPOW! • Point in time restore? BIF!
  42. 42. Why not RDS?• Hardware cap (vertical v. horizontal)• Sophisticated use-cases • Frequent topology changes • Multi-region replication (on their roadmap)• DBAs need busy work
  43. 43. Educates millions of studentsReaches millions of citizens AndyAnalyzes billions of Ads Skalet
  44. 44. Andy Skalet - CTO, BrandVerity
  45. 45. Managed Services Bias
  46. 46. New Products/Markets – YesSQL!
  47. 47. Big Data? Cast your problem
  48. 48. AWS Options
  49. 49. Case Study: Crawl history
  50. 50.
  51. 51. • Managed services let you focus on creating value• Amazon S3 - Very robust, handles large items, but you filter• Amazon DynamoDB - Extremely fast, scalable, good value • Must cast your problem as kvs or key + range• Amazon RDS - MySQL, without the headaches• Amazon ElastiCache - As memcached, fast kvs for small data• Multi column queries on big data? • Looking forward to the AWS solution
  52. 52. Thank youFree raghavas@amazon
  53. 53. We are sincerely eager to hear your feedback on thispresentation and on re:Invent. Please fill out an evaluation form when you have a chance.