3. Pictr
Photo website. You upload photos, they are
resized, watermarked, and put on the homepage
Photos can be uploaded by the public
Photos are converted asynchronously by
separate processing servers
Very simple app, but hopefully shows some
ideas of how cloud computing apps work
4. Why Pictr?
Lots of Disk Storage
Lots of Bandwidth
Offline processing
I want to manage and host as little as possible
It will be an instant hit. Needs to scale from day 1
6. Why the Cloud?
I don’t want to build or configure hardware.
I don’t want to do much capacity planning.
I want to pay as I go for servers and service.
Offload as much as possible.
Lots of Bandwidth
Offline processing
I want to manage and host as little as possible
It will be an instant hit. Needs to scale from day 1
7. How is it built?
Rails application running on Amazon EC2 servers
Asynchronous image processing (resizing, effects)
Database is Amazon SimpleDB
Storage is Amazon S3
CDN is Amazon Cloudfront
Messaging queue is Amazon SQS
ImageMagick is used for image manipulation
9. Why Amazon?
Simple. Right now, they are the market leader...
But the competition is good.
Tight integration between services
RESTful APIs for their services
Lots of documentation and libraries
10. Amazon Ec2
Servers “on demand”.
Pay by the hour: $0.10 - $0.80/hour
Non-persistent.
Forces you to automate your configuration.
RightScale helps a lot with this.
Elastic IP can give you a static IP
Elastic Block Store gives you persistent data
store
11. Amazon S3
Oldest of the publicly available web services
Scalable, reliable data storage mechanism
Bucket/object concept
Pay for the data you are storing, as well as
bandwidth in and out
$0.15/GB stored. $0.10/GB transferred. $0.01/
request.
Very fine grained permissions
12. CloudFront
Similar to LimeLight, Akamai
Low latency, high data rate transfers
8 edge locations in the US
Origin server is an S3 bucket
24 hour object expiration
Very simple API
acf = RightAws::AcfInterface.new
acf.create_distribution('pictr.s3.amazonaws.com', 'Woo-Hoo!', true)
13. Amazon Simple Queue Service
Reliable, scalable queue
PUSH and PULL messages
Messages can be up to 8K of text
Messages are “locked” as they are processed,
preventing them from being read multiple times.
14. Amazon SimpleDB
Simple (but powerful) database
Non-relational. No tables, only “domains”. No joins.
No schema. Define it as you go.
All data is stored as strings.
Attributes can have up to 256 values.
Automatically indexes all your data.
Not exactly a speed daemon.
16. ActiveSDB
RightAws::ActiveSdb.establish_connection
class Item < RightAws::ActiveSdb::Base; end
Item.create_domain
item = Item.create :Category => "Electronics", :Size => 47, :Manufacturer => "Sony"
# Add a new "column"
item["Illumination"] = "LCD"
item.save
# New category. Now, category is an "array"
item["Category"] << "Televisions"
item.save
items = Item.select(:all, :conditions => ["Size = ?", 47])
# => [#<Item:0x1a8a5ec @new_record=false, @attributes={"Size"=>["47"], "Category"=>["Electronics",
"Televisions"], "id"=>"788b5192-8a85-11de-a035-00264a1272e2", "Manufacturer"=>["Sony"],
"Illumination"=>["LCD"]}>]
17. SimpleDB caveats
All data is stored as a string.
Response cannot be > 1MB
Maximum query execution time: 5 sec
Eventually consistent (a tradeoff for availability)
Speed (network)
18. What does all this mean?
I worry less about scaling
I worry more about configuration management
and automation
Database performance is very consistent
S3, Cloudfront handles static file serving
Minimal upfront investment
Only pay for what I need
19. SQS
SimpleDB
Web Servers Converter Servers
S3
Users
20. Step 1: Upload
S3
EC2 INSTANCE
REDIRECT ON
SUCCESS
POST
PIC.JPG
INITIATES PROCESSING
21. Step 2: Queue the jobs
# Jobs are created
jobs = [ConvertJob.new(key, "paint", "400", "-paint 5"),
ConvertJob.new(key, "monochrome", "400", "-monochrome")]
# Push the jobs into the queue
sqs = RightAws::SqsGen2.new
convert_queue = sqs.queue("convert")
jobs.each do |job|
convert_queue.send_message job.to_json
end
22. Step 3: Create SimpleDB record
# class Picture < RightAws::ActiveSdb::Base
# ...
# end
picture = Picture.create(:key => key)
picture["total_conversions"] = ["paint", "monochrome"]
picture.save
23. SimpleDB record
# Before conversion
{"key"=>["uploads/f1"],
"total_conversions"=>["paint", "monochrome"]}
# After conversion
{"key"=>["uploads/f1"],
"total_conversions"=>["paint", "monochrome"],
"paint"=>["processed/f1_paint.jpg"],
"monochrome"=>["processed/f1_monochrome.jpg"]
}
24. Step 4: Processor Daemon
loop do
sqs = RightAws::SqsGen2.new
convert_queue = sqs.queue("convert")
message = convert_queue.pop
if message
job = JSON.parse message.to_s
puts "Found a new message #{job.key}"
job.run!
end
sleep(5)
end
rightscale is a cloud management platform
goal is to give you an overview of some of the cloud technologies,
and how you can use ruby to leverage them for your own app
will show you a sample app
no users, no permissions. nothing.
photo processing is a good sample app.
completely contrived
could scale pretty well if needed.
around 300 lines of code
tongue in cheek
boot in a few minutes
animoto story
starling, based on memcached
beanstalkd
people are starting to look at alternatives to traditional ACID databases. too much overhead, and not scalable enough.
database is first big bottleneck
very consistent. 100, 10,000, 1,000,000 records all had similar response time. ~100ms
return from a query is ids, you need to reload to get attributes
first version of API had a very custom. algabraic query language. now supporting more SQL like expressions. union, intersection. now deprecated.
get a token to retrieve next set of results if results are too big