MYSQL Patterns in Amazon - Make the Cloud Work For You


Published on

PalominodDB's Jay Edwards and Ben Black will show you how to build your MySQL environment in the cloud -- how to maintain it -- how to grow it -- and how to deal with failure.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

MYSQL Patterns in Amazon - Make the Cloud Work For You

  1. 1. Jay Edwards & Ben BlackPalominoDB{jay, ben}@palominodb.comMySQL in AWSPatterns
  2. 2. Agenda1. Introduction2. RDS, EC2/MySQL3. Web console, CLI, API4. Performance/Availability5. Implementation choices6. Managing DDL7. Common failures8. Cost9. Questions
  3. 3. About usJay!CTO, PDB, OFA, TwitterBen!Sr. DBA, PDB, GarminBooth? Yes. Hiring? Yes.
  4. 4. InteractivityAsk away; weve got time. Ben will be glad totry and solve your problems.AWS tutorial?• "Click on the replica button and come backin 30 minutes"• "PIOPs <-> EBS. Uncheck that box andcome back in 2 hours"
  5. 5. RDS and EC2/MySQL
  6. 6. RDS benefitsFully managed• High Availability• Replicas? *click*• PIT recover? *click*• *click, click, click*
  7. 7. RDS un-benefitsFully managed• No binlog access• No SUPER• No flexible topologyThe more experienced a DBA you are, thecrankier you will be.
  8. 8. RDS improves!Like all AWS properties, RDS features continueto improve all the time.Its perfect for developers, proofs of concept,one-offs, absorbing temporary load.(Tungsten supports replication into RDS fromMySQL).
  9. 9. EC2/MySQLAll the MySQL youve come to love & hateMulti-Region via replication & WAN tunnel
  10. 10. Why RDS or EC2?RDS1. You can tolerate ~99% uptime (which manypeople can)2. You dont have lots of DBAs and need tooptimize for operational easeEC21. Multi-region availability2. Vertical scaling
  11. 11. Questions?Any particular scenarios you want to ask usabout?
  12. 12. Web Console, CLI, API
  13. 13. OverviewFunctionality isnt complete• some things arent exposed via somemethods
  14. 14. Web ConsoleMost of the stuff you need for common day-to-day maintenanceSometimes:• slow• isnt working• needs rage-clicking
  15. 15. CLI setupRDS CLIexport AWS_RDS_HOMEexport AWS_CREDENTIAL_FILE(AWSAccessKeyId,AWSSecretKey)
  16. 16. CLI painIts written in Java right now*. The JVMoverhead makes it painfully slow for large-scale automation.* The future is the Redshift CLI (python,coherent interface)
  17. 17. CLI outputVerbose and clunkyDBINSTANCE,scp01-replica2,2010-05-22T01:53:47.372Z,db.m1.large,mysql,50,(nil),master,available,,3306,us-east-1b,(nil),0,(nil),(nil),(nil),(nil),(nil),(nil),(nil),(nil),sun:05:00-sun:09:00,23:00-01:00,(nil),n,5.1.50,n,simcoprod01,general-public-licenseSECGROUP,Name,StatusSECGROUP,default,activePARAMGRP,Group Name,Apply StatusPARAMGRP,default.mysql5.1,in-synchCombining the worst features of machine- and human-readable textformats.
  18. 18. APIUse Boto! (Mitch works for AWS).
  19. 19. Apply immediately.--apply-immediatelyCheck the box hiding at the bottom of the page.
  20. 20. AvailabilityHow many nines?
  21. 21. EC2 Region SLA99.95% SLA“Annual Uptime Percentage” is calculated bysubtracting from 100% the percentage of 5minute periods during the Service Year in whichAmazon EC2 was in the state of “RegionUnavailable.”("Region unavailable" == "multiple AZs are toast")Implies youve got to go multi-region
  22. 22. EC2 Region SLA~99.2% RealityThe previous definition is very strict; 2 or moreregions; cant create instances; blah, blah.1-2X year multi-AZ degradation (EBS, network,who knows)
  23. 23. Multi-regionIts coming for RDS. Probably before the end ofthe year.Until then...
  24. 24. Always go Multi-AZMinimal downtime for most maintenanceSaves you from most master crashes{Sometimes, often, frequently} destroys allreplicas
  25. 25. Multi-AZ binlogssync_binlog=1innodb_support_xa=1• used to DESTROY write throughput• MySQL 5.6 drastic improvements
  26. 26. Questions?Questions about designing for availability?
  27. 27. Implementation Choices
  28. 28. Instance sizingDynamicity == reduced cost(Now, in general, $$ isnt why you go to thecloud; its operational efficiency & reducedfriction).Have a spreadsheet and do capacity analysesfrequently.
  29. 29. Ephemeral SSDsReally nice! 150,000* IOPsReally bad! ~~POOF~~• Excellent for replicas• Requires operational excellence* YMMV
  30. 30. Provisioned IOPsReally, really nice!• Drastically lower failure rate (order ofmagnitude)• Guaranteed throughputNot so nice.• Costs $$
  31. 31. Provisioned IOPsMasters and replicas can be different.You can convert PIOPs <-> EBS back andforth.Consider multi-AZ PIOPs master for the best indurability.
  32. 32. VPCsGo VPC from the beginning for production.• Hard to convert• Use ELBs for internal load-balancing• Not sharing the with everybody
  33. 33. Cluster computePlacement groups are available for CCinstances."Placement group" means "physically closehardware".Very low-latency 10GbE full bisection
  34. 34. Questions?Questions about your particular setup?
  35. 35. Managing DDL
  36. 36. DDLNot possible to perform ddl on a slave, thenswap with master.Slave promotionBlocking DDL
  37. 37. DDLOnline schema changes(log_bin_trust_function_creators)No OS accessBe careful cleaning up if you ctrl-cCALL mysql.rds_skip_repl_error;
  38. 38. Questions?Questions about DDL?
  39. 39. Escape from RDSmysql schema--routinesusers?
  40. 40. Dumping Usersmysql --host=olddatabasehost -BNe "selectconcat(,user,@,host,) from mysql.user where usernot like rds% and user != master" | while read uh; do mysql --host=olddatabasehost -BNe"show grants for $uh" | sed s/$/;/; s///g; done >user_grants.sql
  41. 41. Common failures(Should really be called zones and regions)
  42. 42. Operations is aboutmanaging change andmitigating risk.
  43. 43. Local failuresDatabase crashesHuman errorLocalized EBS hangHow to mitigate?Multi-AZ PIOPs masterOperational excellenceThrow away & rebuild replicas
  44. 44. Local failures reduxLocal failures should be, at most, annoyances.Runbooks*Game daysMonitoring* Process is a poor substitute for competence.
  45. 45. If you cant deal withexpected and desiredchange, youll never beable to handle unexpectedand unwanted change.
  46. 46. Regional failuresA well-designed architecture will save you.How quickly can your DNS flip?How good is your replication?Do you have a CDN?Is your application going to run?Not everybody can afford this.
  47. 47. Zones and RegionsA zone is analogous to a data center (for somesmall number of buildings).A region is a geographically dispersedcollection of zones that is distinct from anyother region.
  48. 48. Zones & Regions differDifferent instance typesDifferent featuresDifferent provisioning capacityOFA had ~40% of the US-East mediuminstances at one point. Couldnt duplicate thatin US-West
  49. 49. Questions?
  50. 50. Cost
  51. 51. Reserved instances• Substantial savings (how often do you turnoff production databases?)• Secondary market• Must match AZ and instance size• Discount couponHeavy utilization instances charge thehourly rate 24x7
  52. 52. Watch the $Spreadsheet!Inventory!Load analysis!Cloudability!
  53. 53. DynamicityThe only thing you cant do is downsizestorage.Change instance size? Check.Turn PIOPs off? Check.Delete replicas? Check.Up to meet need. Down to meet budget.
  54. 54. UpgradingMinor upgrade (can be auto during maintwindow / will reboot or failover)*Disable thisUpgrade from 5.5 to 5.61) Dump/load2) Delta load3) Switchover
  55. 55. FinAsk away!