Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

20,743 views

Published on

Amazon Simple Queue Service (SQS) and Amazon DynamoDB build together a really fast, reliable and scalable layer to receive and process high volumes of messages based on its distributed and high available architecture. We propose a full system that would handle any volume of data or level of throughput, without losing messages or requiring other services to be always available. Also, it enables applications to process messages asynchronously and includes more compute resources based on the number of messages enqueued.
The whole architecture helps applications reach predefined SLAs as we can add more workers to improve the whole performance. In addition, it decreases the total costs because we use new workers briefly and only when they are required.

Published in: Technology, Business

Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

  1. 1. ARC301 - Controlling the Flood: Massive Message Processing with Amazon SQS and Amazon DynamoDB Ari Dias Neto, Ecosystem Solution Architect November 14, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  2. 2. Who am I? • The Mailman from Brazil – Delivering messages around the world!
  3. 3. Returning all the messages…
  4. 4. How many Mailmen? When? How long?
  5. 5. Who am I? Ari Dias Neto – Ecosystem Solutions Architect What are we going to do? We are going to design and build an application to handle any volume of messages! Right now!
  6. 6. Scenario – Super Bowl Promotion: who is going to win?
  7. 7. Promotion • Subscription based on SMS – Cellphone number is the key Requirements • • • • • We cannot lose any message We need to process all the valid messages Log all the invalid messages and errors Beautiful dashboard at the end We must process all the messages during the event!
  8. 8. Who is going to be in the front-line? Fast! Scalable! Reliable! Simple! Fully managed!
  9. 9. Amazon Simple Queue Service Fully Managed Queue Service SQS
  10. 10. Any volume of data At any level of throughput
  11. 11. We cannot lose any message
  12. 12. No up-front or fixed expenses
  13. 13. Architecture – Starting with SQS We have received all the messages SQS Now we need to process all of it
  14. 14. Architecture – Amazon EC2 Instances SQS BUT! how many Instances?
  15. 15. Architecture – Multithread application EC2 Instances Reduce the costs and increase performance
  16. 16. Architecture Threads Workers SQS
  17. 17. But how many instances do we need? 01 instance EC2 m1.xlarge 10 instances 10 instances 100k msgs/minute 1M msgs/minute 5M messages 5 minutes
  18. 18. Architecture Auto Scaling based on the number of msgs in the queue SQS Auto Scaling Group
  19. 19. Architecture Where should we save all the messages? High Throughput Needed SQS Auto Scaling Group
  20. 20. Amazon DynamoDB
  21. 21. Two tables… valid-votes DynamoDB invalid-votes
  22. 22. Architecture valid-votes invalid-votes SQS Auto scaling Group
  23. 23. The Dashboard
  24. 24. Final Architecture Workers Web Dashboard DynamoDB AWS Elastic Beanstalk Container SQS Auto Scaling Group
  25. 25. Benefits • Ready for any level of throughput • SQS • Ready for any required SLA • Auto Scaling and EC2 • Low Cost • Fully managed queue service • Infrastructure is based on the required SLA • Infrastructure needed for an small period of time
  26. 26. The challenge! Process all the messages from the queue in 10 minutes!
  27. 27. Let’s go deep! Let's code!
  28. 28. Each thread Connect to SQS queue Queue Set “read” in the queue Save as valid or invalid DynamoDB Read up to 10 msgs Queue Validate each message
  29. 29. Each thread Connect to SQS queue Queue Set “read” in the queue Save as valid or invalid DynamoDB Read up to 10 msgs Queue Validate each message
  30. 30. Steps to deploy it on AWS ✔Create the queue. Queue name: votes ✔Upload application to S3: s3-sa-east-1.amazonaws.com/arineto/processor.jar ✔Create AMI with JRE. Image ID: ami-05355a6c Create bootstrap script: userdata.txt Create launch configuration Create Auto Scaling Group Create alarms Launch it!
  31. 31. The Company • BigData Corp. was founded to help companies solve the challenges associated with big data, from collection to processing to information and knowledge extraction.
  32. 32. The Challenge • “How many e-commerce websites exist in your continent? Can we monitor them on a consistent basis?” – Build a crawling process that can answer this question in a cost effective and speedy manner.
  33. 33. Architecture • Spot Instances + SQS + S3 = Magic – Spot Instances allow us to optimize processing costs – Amazon SQS allows us to orchestrate the process in a distributed and asynchronous manner – Amazon Simple Storage Service (S3) facilitates the storage of intermediate and final processing results
  34. 34. Architecture Maestro (reserved instance) List of crawl URLs Main Workers Secondary Workers (queue listeners) Spot Instances Spot Instances Command and Control Queue Execute crawling and process data Secondary work queues – processed data Reprocess data, query additional services, store data on MongoDB MongoDB cluster
  35. 35. Architecture (3) • Message Volumes – Processing starts by uploading 10MM+ messages – Each processed message may generate up to 10 new intermediate messages – Peak processing of 70K messages / second • Command & Control Queue – This queue enables us to adjust processing as we go and request status checks from instances
  36. 36. Results (1) $900,000.00 $800,000.00 $700,000.00 $600,000.00 $500,000.00 $400,000.00 $300,000.00 $200,000.00 $100,000.00 $- 0 1 2 3 4 Estimated cost without AWS 5 6 7 Cost with AWS 8 9 10 11 12
  37. 37. Results(2) 2+ PB of data processed 40+ Bi web pages visited and parsed 500+ services and technologies mapped A complete new view of the web market
  38. 38. Please give us your feedback on this presentation ARC301 As a thank you, we will select prize winners daily for completed surveys!

×