From dedicated to cloud infrastructure

794 views

Published on

Presentation given to the gaming technology forum in the UK, 13/11/2009, on challenges and strategies for migrating to cloud.

Published in: Technology
  • Be the first to comment

From dedicated to cloud infrastructure

  1. 1. From dedicated to cloud infrastructure Gojko Adzic Advanced Games Lab http://gojko.net [email_address]
  2. 2. Why? <ul><li>Less hardware = less hassle </li></ul><ul><li>Scale up on demand to handle peaks </li></ul><ul><li>Scale down to save money after </li></ul>
  3. 3. What? <ul><li>Anything not security-sensitive or required under regulation </li></ul><ul><ul><li>Web sites </li></ul></ul><ul><ul><li>Message servers </li></ul></ul><ul><ul><li>Price feeds, screen scraping... </li></ul></ul><ul><ul><li>Public data </li></ul></ul>
  4. 4. Challenge #1: no NAS <ul><li>S3 is slow </li></ul><ul><li>EBS volumes attach only to one instance </li></ul><ul><li>SimpleDB is a big hash table, reliable but slow </li></ul><ul><li>However: </li></ul><ul><li>New SQL service </li></ul><ul><li>Asynchronous persistence with data caches </li></ul><ul><li>Offload to SQS </li></ul>
  5. 5. Challenge #2: undedicated network <ul><li>No multicast </li></ul><ul><li>Machines being locked out for 10-15 mins </li></ul><ul><li>Occasional unreliable networking between nodes </li></ul>
  6. 6. Challenge #3: load balancing <ul><li>Can't count on any particular node being reliable </li></ul><ul><li>Basic TCP clustering available </li></ul><ul><ul><li>No sticky session </li></ul></ul><ul><li>However: </li></ul><ul><li>Easy IP reassignment so DNS round-robin </li></ul><ul><li>Automatic cluster up-scaling </li></ul>
  7. 7. Challenge #3: CDN <ul><li>CloudFront has 24hrs refresh cycle </li></ul><ul><li>1 CNAME per distribution </li></ul><ul><li>No SSL </li></ul><ul><li>However </li></ul><ul><li>New distribution ~ 10 mins </li></ul><ul><li>S3 directly has HTTP + SSL </li></ul>
  8. 8. Challenge #4: shared knowledge <ul><li>Machines go up and down, new ones get added </li></ul><ul><li>No NAS to store shared configuration </li></ul><ul><li>However: </li></ul><ul><li>Map /etc/hosts on S3 </li></ul><ul><li>Put config into SimpleDB, use cron tasks to refresh machines </li></ul>
  9. 9. Challenge #5: security <ul><li>No cleanup guarantees </li></ul><ul><li>No SLAs </li></ul><ul><li>No real control over security </li></ul><ul><ul><li>VPN to protect transport available </li></ul></ul>
  10. 10. Preparing for the cloud <ul><li>Split the data </li></ul><ul><li>Break into standalone stateless systems </li></ul><ul><li>Prefer horisontal scaling for stateful parts </li></ul><ul><li>Closely monitor single points of failure </li></ul><ul><li>Use HA resources </li></ul>
  11. 11. Splitting the data <ul><li>Not all data is the same </li></ul><ul><ul><li>Does it need transactions? </li></ul></ul><ul><ul><li>Does it need security? </li></ul></ul><ul><ul><li>Does it need querying? </li></ul></ul><ul><li>Probably never for accounts, transactions and key customer data </li></ul><ul><ul><li>But really good for profiles, reference data, possibly scrubbed tx logs/betting history... </li></ul></ul>
  12. 12. Standalone stateless systems <ul><li>Isolate blocks that can easily be replicated </li></ul><ul><ul><li>Prepare AMIs with full software/security settings </li></ul></ul><ul><ul><li>Retrieve configuration from SimpleDB or S3 on start </li></ul></ul><ul><ul><li>Use TCP clustering or automated DNS round-robin to expose new servers </li></ul></ul><ul><li>Push state into HA resources </li></ul><ul><ul><li>Optional caching at this level </li></ul></ul>
  13. 13. Horisontal scaling for state <ul><li>Use data grids to off-load and cluster </li></ul><ul><ul><li>Coherence, GigaSpaces, Terracota </li></ul></ul><ul><li>Automate packaging and configuration as much as possible (RPMs, S3, SimpleDB)... </li></ul><ul><li>Ensure that the configuration can grow dynamically </li></ul><ul><li>Use software that survives disconnects and unreliable networks </li></ul>
  14. 14. If clustering is not possible, keep your eyes open <ul><li>Monitor the system closely and prepare for a quick reaction </li></ul><ul><ul><li>Ideally a full AMI that loads configuration from S3 </li></ul></ul><ul><ul><li>If not, have RPMs ready </li></ul></ul><ul><ul><li>Internal IPs aren't recyclable, make sure other systems can switch to a different resource </li></ul></ul><ul><ul><ul><li>S3 hosts file, (DNS?), cron to reload configuration </li></ul></ul></ul>
  15. 15. Use HA resources <ul><li>SimpleDB </li></ul><ul><li>S3 </li></ul><ul><li>Hash databases (noSql) </li></ul><ul><li>SQS (beware of 8k limit) </li></ul><ul><li>CloudMQ </li></ul>
  16. 16. Questions? <ul><li>gadzic@advancedgameslab.com </li></ul><ul><li>http://gojko.net </li></ul>

×