This document provides an overview of a presentation on using Amazon SQS and DynamoDB to process massive amounts of messages. The presentation discusses using SQS to reliably queue messages at scale, processing the messages in parallel using auto-scaled EC2 instances, and storing results in DynamoDB. It also describes an example application built by the presenter to handle promotional votes for a Super Bowl winner by scaling up to process millions of SMS votes within a tight 10 minute window.
5. Who am I?
Ari Dias Neto – Ecosystem Solutions Architect
What are we going to do?
We are going to design and build an application to handle
any volume of messages! Right now!
7. Promotion
• Subscription based on SMS
– Cellphone number is the key
Requirements
•
•
•
•
•
We cannot lose any message
We need to process all the valid messages
Log all the invalid messages and errors
Beautiful dashboard at the end
We must process all the messages during the event!
8. Who is going to be in the front-line?
Fast!
Scalable!
Reliable!
Simple!
Fully managed!
25. Benefits
• Ready for any level of throughput
• SQS
• Ready for any required SLA
• Auto Scaling and EC2
• Low Cost
• Fully managed queue service
• Infrastructure is based on the required SLA
• Infrastructure needed for an small period of time
30. Steps to deploy it on AWS
✔Create the queue. Queue name: votes
✔Upload application to S3: s3-sa-east-1.amazonaws.com/arineto/processor.jar
✔Create AMI with JRE. Image ID: ami-05355a6c
Create bootstrap script: userdata.txt
Create launch configuration
Create Auto Scaling Group
Create alarms
Launch it!
31.
32. The Company
• BigData Corp. was founded to help companies
solve the challenges associated with big data,
from collection to processing to information and
knowledge extraction.
33. The Challenge
• “How many e-commerce websites exist in your
continent? Can we monitor them on a consistent
basis?”
– Build a crawling process that can answer this question in a cost
effective and speedy manner.
34. Architecture
• Spot Instances + SQS + S3 = Magic
– Spot Instances allow us to optimize processing costs
– Amazon SQS allows us to orchestrate the process in a
distributed and asynchronous manner
– Amazon Simple Storage Service (S3) facilitates the storage of
intermediate and final processing results
35. Architecture
Maestro
(reserved
instance)
List of crawl
URLs
Main Workers
Secondary Workers
(queue listeners)
Spot Instances
Spot Instances
Command and
Control Queue
Execute
crawling and
process data
Secondary
work queues –
processed data
Reprocess
data, query
additional
services, store
data on
MongoDB
MongoDB
cluster
36. Architecture (3)
• Message Volumes
– Processing starts by uploading 10MM+ messages
– Each processed message may generate up to 10 new
intermediate messages
– Peak processing of 70K messages / second
• Command & Control Queue
– This queue enables us to adjust processing as we go and
request status checks from instances