ARC301 - Controlling the Flood: Massive Message
Processing with Amazon SQS and Amazon DynamoDB
Ari Dias Neto, Ecosystem So...
Who am I?
• The Mailman from Brazil
– Delivering messages around the world!
Returning all the
messages…
How many Mailmen?
When?
How long?
Who am I?
Ari Dias Neto – Ecosystem Solutions Architect

What are we going to do?
We are going to design and build an appl...
Scenario – Super Bowl

Promotion: who is going to win?
Promotion
• Subscription based on SMS
– Cellphone number is the key

Requirements
•
•
•
•
•

We cannot lose any message
We...
Who is going to be in the front-line?

Fast!
Scalable!
Reliable!
Simple!
Fully managed!
Amazon Simple Queue Service

Fully Managed
Queue Service
SQS
Any volume of data
At any level of throughput
We cannot lose any message
No up-front or
fixed expenses
Architecture – Starting with SQS
We have received all
the messages

SQS

Now we need to
process all of it
Architecture – Amazon EC2 Instances

SQS

BUT!
how many
Instances?
Architecture – Multithread application

EC2
Instances

Reduce
the costs
and
increase
performance
Architecture

Threads
Workers

SQS
But how many instances do we need?
01 instance

EC2

m1.xlarge

10 instances

10 instances

100k
msgs/minute
1M
msgs/minut...
Architecture

Auto Scaling
based on the
number of msgs
in the queue
SQS
Auto Scaling
Group
Architecture

Where should
we save all the
messages?
High
Throughput
Needed

SQS
Auto Scaling
Group
Amazon DynamoDB
Two tables…

valid-votes

DynamoDB
invalid-votes
Architecture

valid-votes

invalid-votes

SQS
Auto scaling Group
The Dashboard
Final Architecture
Workers

Web
Dashboard

DynamoDB
AWS Elastic
Beanstalk Container

SQS
Auto Scaling Group
Benefits
• Ready for any level of throughput
• SQS
• Ready for any required SLA
• Auto Scaling and EC2
• Low Cost
• Fully ...
The challenge!
Process all the
messages from the
queue
in 10 minutes!
Let’s go deep!
Let's code!
Each thread
Connect
to SQS
queue

Queue

Set
“read” in
the
queue

Save as
valid or
invalid
DynamoDB

Read up
to 10
msgs
Qu...
Each thread
Connect
to SQS
queue

Queue

Set
“read” in
the
queue

Save as
valid or
invalid
DynamoDB

Read up
to 10
msgs
Qu...
Steps to deploy it on AWS

✔Create the queue. Queue name: votes
✔Upload application to S3: s3-sa-east-1.amazonaws.com/arin...
The Company
• BigData Corp. was founded to help companies
solve the challenges associated with big data,
from collection t...
The Challenge
• “How many e-commerce websites exist in your
continent? Can we monitor them on a consistent
basis?”
– Build...
Architecture
• Spot Instances + SQS + S3 = Magic
– Spot Instances allow us to optimize processing costs
– Amazon SQS allow...
Architecture

Maestro
(reserved
instance)

List of crawl
URLs

Main Workers

Secondary Workers
(queue listeners)

Spot Ins...
Architecture (3)
• Message Volumes
– Processing starts by uploading 10MM+ messages
– Each processed message may generate u...
Results (1)
$900,000.00
$800,000.00
$700,000.00
$600,000.00
$500,000.00
$400,000.00

$300,000.00
$200,000.00
$100,000.00
$...
Results(2)
2+ PB of data processed
40+ Bi web pages visited and parsed
500+ services and technologies mapped
A complete ne...
Please give us your feedback on this
presentation

ARC301
As a thank you, we will select prize
winners daily for completed...
Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013
Upcoming SlideShare
Loading in …5
×

Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

10,041
-1

Published on

Amazon Simple Queue Service (SQS) and Amazon DynamoDB build together a really fast, reliable and scalable layer to receive and process high volumes of messages based on its distributed and high available architecture. We propose a full system that would handle any volume of data or level of throughput, without losing messages or requiring other services to be always available. Also, it enables applications to process messages asynchronously and includes more compute resources based on the number of messages enqueued.
The whole architecture helps applications reach predefined SLAs as we can add more workers to improve the whole performance. In addition, it decreases the total costs because we use new workers briefly and only when they are required.

Published in: Technology, Business

Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS re:Invent 2013

  1. 1. ARC301 - Controlling the Flood: Massive Message Processing with Amazon SQS and Amazon DynamoDB Ari Dias Neto, Ecosystem Solution Architect November 14, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  2. 2. Who am I? • The Mailman from Brazil – Delivering messages around the world!
  3. 3. Returning all the messages…
  4. 4. How many Mailmen? When? How long?
  5. 5. Who am I? Ari Dias Neto – Ecosystem Solutions Architect What are we going to do? We are going to design and build an application to handle any volume of messages! Right now!
  6. 6. Scenario – Super Bowl Promotion: who is going to win?
  7. 7. Promotion • Subscription based on SMS – Cellphone number is the key Requirements • • • • • We cannot lose any message We need to process all the valid messages Log all the invalid messages and errors Beautiful dashboard at the end We must process all the messages during the event!
  8. 8. Who is going to be in the front-line? Fast! Scalable! Reliable! Simple! Fully managed!
  9. 9. Amazon Simple Queue Service Fully Managed Queue Service SQS
  10. 10. Any volume of data At any level of throughput
  11. 11. We cannot lose any message
  12. 12. No up-front or fixed expenses
  13. 13. Architecture – Starting with SQS We have received all the messages SQS Now we need to process all of it
  14. 14. Architecture – Amazon EC2 Instances SQS BUT! how many Instances?
  15. 15. Architecture – Multithread application EC2 Instances Reduce the costs and increase performance
  16. 16. Architecture Threads Workers SQS
  17. 17. But how many instances do we need? 01 instance EC2 m1.xlarge 10 instances 10 instances 100k msgs/minute 1M msgs/minute 5M messages 5 minutes
  18. 18. Architecture Auto Scaling based on the number of msgs in the queue SQS Auto Scaling Group
  19. 19. Architecture Where should we save all the messages? High Throughput Needed SQS Auto Scaling Group
  20. 20. Amazon DynamoDB
  21. 21. Two tables… valid-votes DynamoDB invalid-votes
  22. 22. Architecture valid-votes invalid-votes SQS Auto scaling Group
  23. 23. The Dashboard
  24. 24. Final Architecture Workers Web Dashboard DynamoDB AWS Elastic Beanstalk Container SQS Auto Scaling Group
  25. 25. Benefits • Ready for any level of throughput • SQS • Ready for any required SLA • Auto Scaling and EC2 • Low Cost • Fully managed queue service • Infrastructure is based on the required SLA • Infrastructure needed for an small period of time
  26. 26. The challenge! Process all the messages from the queue in 10 minutes!
  27. 27. Let’s go deep! Let's code!
  28. 28. Each thread Connect to SQS queue Queue Set “read” in the queue Save as valid or invalid DynamoDB Read up to 10 msgs Queue Validate each message
  29. 29. Each thread Connect to SQS queue Queue Set “read” in the queue Save as valid or invalid DynamoDB Read up to 10 msgs Queue Validate each message
  30. 30. Steps to deploy it on AWS ✔Create the queue. Queue name: votes ✔Upload application to S3: s3-sa-east-1.amazonaws.com/arineto/processor.jar ✔Create AMI with JRE. Image ID: ami-05355a6c Create bootstrap script: userdata.txt Create launch configuration Create Auto Scaling Group Create alarms Launch it!
  31. 31. The Company • BigData Corp. was founded to help companies solve the challenges associated with big data, from collection to processing to information and knowledge extraction.
  32. 32. The Challenge • “How many e-commerce websites exist in your continent? Can we monitor them on a consistent basis?” – Build a crawling process that can answer this question in a cost effective and speedy manner.
  33. 33. Architecture • Spot Instances + SQS + S3 = Magic – Spot Instances allow us to optimize processing costs – Amazon SQS allows us to orchestrate the process in a distributed and asynchronous manner – Amazon Simple Storage Service (S3) facilitates the storage of intermediate and final processing results
  34. 34. Architecture Maestro (reserved instance) List of crawl URLs Main Workers Secondary Workers (queue listeners) Spot Instances Spot Instances Command and Control Queue Execute crawling and process data Secondary work queues – processed data Reprocess data, query additional services, store data on MongoDB MongoDB cluster
  35. 35. Architecture (3) • Message Volumes – Processing starts by uploading 10MM+ messages – Each processed message may generate up to 10 new intermediate messages – Peak processing of 70K messages / second • Command & Control Queue – This queue enables us to adjust processing as we go and request status checks from instances
  36. 36. Results (1) $900,000.00 $800,000.00 $700,000.00 $600,000.00 $500,000.00 $400,000.00 $300,000.00 $200,000.00 $100,000.00 $- 0 1 2 3 4 Estimated cost without AWS 5 6 7 Cost with AWS 8 9 10 11 12
  37. 37. Results(2) 2+ PB of data processed 40+ Bi web pages visited and parsed 500+ services and technologies mapped A complete new view of the web market
  38. 38. Please give us your feedback on this presentation ARC301 As a thank you, we will select prize winners daily for completed surveys!

×