Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

997 views

Published on

NASA imaging satellites deliver GB's of images to Earth every day. Mapbox uses AWS to process that data in real-time and build the most complete, seamless satellite map of the world. Learn how Mapbox uses Amazon S3 and Amazon SQS to stream data from NASA into clusters of EC2 instances running a clever algorithm that stiches images together in parallel. This session includes an in-depth discussion of high-volume storage with Amazon S3, cost-efficient data processing with Amazon EC2 Spot Instances, reliable job orchestration with Amazon SQS, and demand resilience with Auto Scaling.

Published in: Technology
  • Be the first to comment

(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

  1. 1. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in partwithout the express consent of Amazon.com, Inc. November 13, 2014 | Las Vegas BDT204Rendering a Seamless Satellite Map of the World with AWS and NASA Data Eric Gundersenand Will White, Mapbox
  2. 2. Amazon EC2 Offers low-cost, scalable computing Amazon S3 Data storage for input data and processed output Auto Scaling Controls the number of worker EC2instances Amazon SQS Manages the units of work
  3. 3. Mapbox Satellite
  4. 4. hmmm, this is slow going upgrade EC2 type w00t! killing it spiked regional spot pricing :p increases $ for spot pricing
  5. 5. One image every day for the last two years.
  6. 6. 17,179,869,184pixels x 365 days x 2 years 12.5trillion pixels
  7. 7. That’s a lot of pixels…
  8. 8. We need to •Quickly process massive amounts of data •Distribute processed data to users around the world quickly and reliably •Low cost
  9. 9. Processing
  10. 10. Processing requirements •Massive storage for raw and processed data •Massive computing that we can spin up and down in minutes •Everything must be fully automated •Low cost
  11. 11. Amazon EC2 Low-cost, scalable computing Amazon S3 Data storage for input data and processed output Auto Scaling Controls the number of worker EC2instances Amazon SQS Manages the queue of work
  12. 12. NASA Server Source S3 Bucket Watcher Instance Auto Scaling group SQS Queue Worker Instances Destination S3 Bucket Processed Outputs
  13. 13. Watcher EC2instance •Copies raw data files from NASA server to our S3 bucket •Splits file up into smaller parts and sends them into Amazon SQS as messages
  14. 14. Why stash raw data on Amazon S3? •Extremely low latency between Amazon S3 and Amazon EC2 in the same AWS region •Don’t want to hammer NASA servers with requests from our hundreds of workers •Easy to reprocess data later
  15. 15. Messages for Amazon SQS •Take a big job and split it up into smaller parts •Shorter is better -a few minutes per message is ideal •Messages need to be repeatable in case of failure
  16. 16. Raw data SQS Messages
  17. 17. SQS messages
  18. 18. NASA Server Source S3 Bucket Watcher Instance Auto Scaling group SQS Queue Worker Instances Destination S3 Bucket Processed Outputs
  19. 19. Worker EC2instance Grab message from the queue Source S3 Bucket SQS Queue Destination S3 Bucket Download raw data from S3 Run software to process the data Deliver processed data to S3 Delete message from the queue to mark it complete
  20. 20. NASA Server Source S3 Bucket Watcher Instance Auto Scaling group SQS Queue Worker Instances Destination S3 Bucket Processed Outputs
  21. 21. Worker Auto Scaling Group •Capacity is controlled by the number of messages in the queue •Spikes are no problem: more instances come online automatically
  22. 22. Auto Scaling CloudWatch Amazon SQS (Queue Size) Data processing
  23. 23. SQS Messages EC2 Instances
  24. 24. SQS Messages EC2 Instances
  25. 25. SQS Messages EC2 Instances
  26. 26. SQS Messages EC2 Instances
  27. 27. NASA Server Source S3 Bucket Watcher Instance Auto Scaling group SQS Queue Worker Instances Destination S3 Bucket Processed Outputs
  28. 28. How can we make this cheap?
  29. 29. Spot market •Bid on unused Amazon EC2 capacity and get a discount •Instance runs as long as your bid price is higher than the market price •If market prices spikes, your instances are terminated immediately •Perfect for big data processing jobs that aren’t on a critical schedule
  30. 30. c3.xlarge / us-east-1e / $0.210 per hour On-Demand Market
  31. 31. c3.xlarge / us-east-1e / $0.210 per hour $151.20 per month On-Demand Market
  32. 32. avg price $0.032
  33. 33. c3.xlarge / us-east-1e / $0.0321 per hour $23.11 per month Spot Market
  34. 34. Running 200 c3.xlarge instances $25,618 in savings per month Spot Market
  35. 35. The graph isn’t always flat.
  36. 36. bid price $1.90
  37. 37. bid price $0.60
  38. 38. bid price $0.60
  39. 39. bid price $1.15
  40. 40. Spot market •Jobs need to be small (just like Amazon SQS) •Be prepared for spikes: wait them out or increase your bid price
  41. 41. How do we get the data to users?
  42. 42. Distribution
  43. 43. In the past 30 days we have served 9.8 billion requests
  44. 44. That’s a lot of requests…
  45. 45. Distribution requirements •Massive storage for processed data •HTTP sever capacity that we can spin up and down in minutes •Global distribution for speed and redundancy •Everything must be fully automated •Low cost
  46. 46. Amazon EC2 Offers low-cost, scalable computing Amazon S3 Data storage for input data and processed output Auto Scaling Controls the number of worker EC2instances Amazon SQS Manages the units of work
  47. 47. Amazon EC2 Offers low-cost, scalable computing Amazon S3 Data storage for input data and processed output Auto Scaling Controls the number of worker EC2instances Amazon SQS Manages the units of work
  48. 48. Amazon EC2 Offers low-cost, scalable computing Amazon S3 Data storage for input data and processed output Auto Scaling Controls the number of worker EC2instances Distributes web traffic between multiple EC2instances Elastic Load Balancing
  49. 49. NASA Server Source S3 Bucket Watcher Instance Auto Scaling group SQS Queue Worker Instances Destination S3 Bucket Processed Outputs
  50. 50. S3 Bucket Virginia S3 Bucket São Paulo S3 Bucket Ireland S3 Bucket Tokyo S3 Bucket California S3 Bucket Singapore S3 Bucket Sydney S3 Bucket Oregon Processed Outputs S3 Bucket Frankfurt
  51. 51. region S3 Bucket Auto Scaling group Server Instances Elastic Load Balancing
  52. 52. region region region Amazon Route 53 users
  53. 53. Auto Scaling CloudWatch Amazon SQS (Queue Size) Data processing
  54. 54. Auto Scaling CloudWatch Amazon SQS (Queue Size) Data processing Auto Scaling CloudWatch Data distribution Elastic Load Balancing (Request Count)
  55. 55. Requests over 7 days Running instances over 7 days
  56. 56. Running instances across all regions over 7 days
  57. 57. How can we make this cheap?
  58. 58. Instance reservations •Buy computing up front for long-running instances •Large upfront charge in exchange for low hourly usage cost •Save up to 60% or more over the course of a year •Perfect for critical instances that need to stay online
  59. 59. Reservations about reservations •Took us over a year to commit •Changing infrastructure: splitting applications, new instance types
  60. 60. What made us eventually buy •Easily swap reservations for instances within the same family •Sell unused instances on the secondary market •Cloudability: Great reservation recommendation tool
  61. 61. Amazon EC2 Amazon S3 Auto Scaling Amazon SQS CloudWatch Elastic Load Balancing Amazon Route 53 CloudFront
  62. 62. Please give us your feedback on this session. Complete session evaluations and earn re:Invent swag. BDT204 http://bit.ly/awsevals

×