Scaling Mapufacture on Amazon Web Services

7,379 views

Published on

Some notes on scaling Mapufacture on Amazon Web Services given at the Michigan Ruby Users Group

Published in: Technology
0 Comments
15 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
7,379
On SlideShare
0
From Embeds
0
Number of Embeds
55
Actions
Shares
0
Downloads
0
Comments
0
Likes
15
Embeds 0
No embeds

No notes for slide

Scaling Mapufacture on Amazon Web Services

  1. 1. Scaling Mapufacture with Amazon Web Services Andrew Turner http://highearthorbit.com andrew@mapufacture.com
  2. 2. Scaling as taught by gnomes Phase 1 - Scale a Web Service Phase 2 - ?????????? Phase 3 - $$$$PROFIT$$$$
  3. 3. Scaling a work in progress • We’re half-way along • Lessons learned • Pointers • Tools • Points of discussion
  4. 4. About me
  5. 5. Airships
  6. 6. Back down to earth
  7. 7. $8.9 9 http://oreilly.com/catalog/neogeography
  8. 8. Mapufacture Concept
  9. 9. Mapufacture Mobile
  10. 10. Mapufacture Widgets
  11. 11. RESTful HTML - http://mapufacture.com/maps GeoRSS - http://mapufacture.com/maps.rss Atom-GeoRSS - http://mapufacture.com/maps.atom GeoJSON - http://mapufacture.com/maps.json KML - http://mapufacture.com/maps.kml XML - http://mapufacture.com/maps.xml Maps - http://mapufacture.com/maps (atom, rss, json, kml, xml) Feeds - http://mapufacture.com/feeds (atom, rss, json, kml, xml) Tags - http://mapufacture.com/tags (atom, rss, json, kml, xml) Users - http://mapufacture.com/users (atom, rss, json, kml, xml) http://mapufacture.com/about/developers
  12. 12. OpenSearch OpenSearch-Geo & Time keyword - a free-form string to filter search. Applied to title, descriptions, tags, categories. Can further filter by using quot;field:keywordquot;, for example keyword=title:dogs location - toponymic name (e.g. Boston or France) to search within lat, lon - latitude and longitude of the center of a search. Use with distance distance - the radius distance, in miles, to search about the center bbox - the bounding box string to search within. Order is quot;west, south, east northquot; limit - maximum number of results page - page number to get the results of http://mapufacture.com/search.atom?location=Ann+Arbor
  13. 13. PocketMaps
  14. 14. Platform • Ruby on Rails • PostgreSQL with PostGIS • Memcached • Lucene and Solr • Apache • Mongrel
  15. 15. Grow only as fast as necessary i.e. don’t be a Mentarbator http://flickr.com/photos/mattdm/153703472/
  16. 16. Started with the little things... • memcached • database indexes • clean code • ruby-prof, bleakhouse http://cfis.savagexi.com/articles/2007/07/10/how-to-profile-your-rails-application • Still can be slow Find me all the items about Jazz bands playing between next Friday and Sunday within 5 miles of my house
  17. 17. Shared Server Internet FastCGI ? ? Rails • Shared server ? • FastCGI MySQL • MySQL • Cheap
  18. 18. Dedicated Server Internet Apache Mongrel Solr Rails • Dedicated server • Mongrel Lucene PostgreSQL • PostgreSQL • Not Cheap
  19. 19. + Decisions • Scale the aggregator • Job queues • Don’t trust instances • Cache feeds and data sources
  20. 20. + SQS Dedicated Server Fast Update Apache Mongrel EC2 SSH Ruby Rails PostgreSQL EC2 Solr Lucene SSH Ruby S3
  21. 21. TLA’s Decoded • EC2 - Elastic Computing Cloud more computing than you can shake a stick at • S3 - Simple Storage Service a place to keep your stuff • SQS - Simple Queue Service easier than Drb • AWS - Alexa Web Services who’s the most popular?
  22. 22. How to use EC2 http://flickr.com/photos/davebluedevil/17508904/
  23. 23. Easy to Start 1. Sign up for Amazon Web Services http://google.com/search?q=amazon+web+services 2. Get the EC2 Command line tools http://google.com/search?q=ec2+tools 3. Choose a base AMI (aka OS image) something nice and stable, like Debian Etch 4. Install-fest 5. Store to S3
  24. 24. Debian Etch AMI Our Install apt-get install -y apache2 irb rdoc gcc make memcached build-essential libgeos-c1 postgresql-8.1-postgis postgresql-client postgis subversion graphicsmagick postgresql-8.1-postgis postgresql-8.1-plruby postgresql- dev sun-java5-jre sun-java5-jdk install ruby and rubygems gem install -y mongrel mongrel_cluster rake rails fakeweb hpricot mofo gruff graticule geonames coderay clusterer feedtools postgres fastercsv libmagick9-dev configure postgresql with postgis setup apache for proxying
  25. 25. Debian Etch AMI Our Install apt-get install -y apache2 irb rdoc gcc make memcached build-essential libgeos-c1 postgresql-8.1-postgis postgresql-client postgis subversion graphicsmagick postgresql-8.1-postgis postgresql-8.1-plruby postgresql- dev sun-java5-jre sun-java5-jdk install ruby andDo once, but never again rubygems gem install -y mongrel mongrel_cluster rake rails fakeweb hpricot mofo gruff graticule geonames coderay clusterer feedtools postgres fastercsv libmagick9-dev configure postgresql with postgis setup apache for proxying
  26. 26. Amazon Tools ec2-add-group ec2-describe-group ec2-modify-image-attribute ec2-add-keypair ec2-describe-image-attribute ec2-reboot-instances ec2-authorize ec2-describe-images ec2-register ec2-cmd ec2-describe-instances ec2-reset-image-attribute ec2-confirm-product-instance ec2-describe-keypairs ec2-revoke ec2-delete-group ec2-fingerprint-key ec2-run-instances ec2-delete-keypair ec2-gem-example.rb ec2-terminate-instances ec2-deregister ec2-get-console-output ec2-version
  27. 27. EC2 UI
  28. 28. Setup $ set|grep -i ec2 EC2_CERT=/Users/ajturner/.ec2-mapufacture/cert- PF4KK6RXUMGG5ZEUTQB4IKJNWZTPGY7Z.pem EC2_HOME=/Applications/Internet/ec2 EC2_PRIVATE_KEY=/Users/ajturner/.ec2-mapufacture/pk- PF4KK6RXUMGG5ZEUTQB4IKJNWZTPGY7Z.pem
  29. 29. Fire it up $ ec2-run-instances ami-d7cd28be -k ~/.ssh/mapufacture.pem $ ec2-describe-instances RESERVATION r-0d23cc46 541028356920 default INSTANCE i-7c3bd117 ami-d7cd28be ec2-67-202-4-648.z-1.compute-1.amazonaws.com domU-12-31-36-00-0A-89.z-1.compute-1.internal running mapufacture 0 m1.small 2007-11-02T17:40:05 +0000 RESERVATION r-28e40a41 541028356920 default INSTANCE i-2d10e533 ami-d7cd28be ec2-67-202-27-894.compute-1.amazonaws.com domU-12-31-37-00-36-45.compute-1.internal running mapufacture 0 m1.small 2007-11-19T15:14:40 +0000 $ ssh -i ~/.ssh/mapufacture.pem root@ ec2-67-202-27-894.compute-1.amazonaws.com
  30. 30. Behold the power... %w[rubygems EC2].each{|g| require g } ec2 = EC2::Base.new( :access_key_id => AWS_ACCESS_KEY, :secret_access_key => AWS_SECRET_KEY) ec2_images = ec2.describe_images(:owner_id => quot;514028356902quot;).imagesSet.item ec2.run_instances :image_id => ec2_images.first.imageId, :user_data => quot;sub-domainquot; ec2.describe_instances.reservationSet.item.first.instancesSet.item.first.dnsName
  31. 31. Earls #!/usr/bin/env bash wget -q -O /tmp/user-data.out http://169.254.169.254/1.0/user-data HN=`cat /tmp/user-data.out` rm -f /tmp/user-data.out wget -q -O - --http-user=USER --http-passwd=PASS quot;http://dynamic.zoneedit.com/auth/dynamic.html?host=$HNquot; > /dev/null 2> /dev/null hostname $HN
  32. 32. ZoneEdit
  33. 33. Capazon don’t use the gem http://niblets.wordpress.com/2007/02/12/capistrano-ec2-sitting-in-a-tree-k-i-s-s-i-n-g/
  34. 34. Read more Details Somewhere Else http://docs.amazonwebservices.com/AmazonEC2/gsg/2007-01-03/
  35. 35. How to use S3 http://flickr.com/photos/harshadsharma/44793133/
  36. 36. bukkits
  37. 37. buckets http://s3.amazonaws.com/pocketmaps/feed_11.pdf bucket filename
  38. 38. aws-s3 %w[rubygems aws-s3 open-uri].each { |g| require g } AWS::S3::Base.establish_connection!( :access_key_id => AWS_ACCESS_KEY, :secret_access_key => AWS_SECRET_KEY ) AWS::S3::S3Object.store(filename, open(filename), 'bucket', :access => :public_read) :private :public_read_write :authenticated_read
  39. 39. attachment_fu Model: source_cache.rb class SourceCache < ActiveRecord::Base belongs_to :source has_attachment :max_size => 40.megabytes, :storage => :s3 end Model: source.rb def get_with_cache file_cache = self.source_caches.first || SourceCache.new file_cache.url = self.url file_cache.source = self file_cache.id = self.id if file_cache.new_record? file_cache.save self.get_without_cache(file_cache.public_filename) end alias_method_chain :get, :cache
  40. 40. Forklift
  41. 41. S3 Firefox Organizer
  42. 42. How to use SQS http://flickr.com/photos/81436485@N00/1329153675/
  43. 43. sqs %w[rubygems aws-s3 open-uri].each { |g| require g } SQS.access_key_id = AWS_ACCESS_KEY SQS.secret_access_key = AWS_SECRET_KEY my_queue = SQS.get_queue('FastQueue') my_queue.send_message “feed:20” messages = my_queue.receive_messages(:count => count) messages.each do |message| puts message.body end
  44. 44. sqs warning! SQS-0.1.5/lib/sqs.rb 350: class Element 351: ... 352: 353: # modified b/c trounces REXML 354: # def attribute ; self.node_text( quot;Attributequot; ) ; end 355: ....
  45. 45. Alternatives? • Drb • Starling • Starfish
  46. 46. How to use Alexa http://flickr.com/photos/devos/95230930/
  47. 47. Ranking >> alexa_rank = AWS::Alexa.rank(quot;mapufacture.comquot;) => 1 264 223 >> alexa_rank = AWS::Alexa.rank(quot;rubymi.orgquot;) => 7 714 998 >> alexa_rank = AWS::Alexa.rank(quot;ruby-lang.orgquot;) => 38 265
  48. 48. Where we’re going EC2 Dedicated Server Load Balance Apache Mongrel PostgreSQL High-Read Replication Rails DB Lucene User-Edits Index
  49. 49. Where we’re going EC2 EC2 EC2 DB Index DB Index DB Index DB Aggregators Dedicated Server SQS Load Balance Aggregators Batch? Aggregators PostgreSQL Lucene
  50. 50. Replication • PostgreSQL: Slony-I • Master to multiple slaves • Fail-over SQL update Update Update Master Update Slave Slave
  51. 51. Database Sharding • Dr. Nic’s Magic Models http://magicmodels.rubyforge.org/magic_multi_connections/ • ActiveDelegate http://www.robbyonrails.com/articles/2007/10/05/multiple-database- connections-in-ruby-on-rails
  52. 52. ActiveDelegate # database.yml # app/models/master_database.rb login: &login class MasterDatabase < ActiveRecord::Base adapter: postgresql handles_connection_for :master_database host: localhost end port: 5432 production: # app/models/animal.rb database: mapufacture_local class Animal < ActiveRecord::Base <<: *login delegates_connection_to :master_database, :on => [:create, :save, :destroy] # NOTICE THE NEXT ENTRY/KEY end master_database: database: mapufacture <<: *login
  53. 53. Costs EC2 Bandwidth Upload: $0.10 per GB Hour Day Month Specs Download: $0.18 per GB - first 10 TB / month 1.7 GB memory S $0.10 $2.40 $72 Requests ~1x 1Ghz Xeon 160 GB storage $0.01 per 1,000 PUT or LIST requests 7.5 GB memory $0.01 per 10,000 GET and all other requests* M $0.40 $9.60 $288 ~2x 2Ghz Xeon 850 GB storage 15 GB memory L $0.80 $19.20 $576 ~4x 2Ghz Xeon 1.7 TB storage S3 SQS Storage: $0.15 per GB / Month Messages Upload: $0.10 per GB $0.10 per 1,000 messages sent Download: $0.18 per GB - first 10 TB / month Requests $0.01 per 1,000 PUT or LIST requests $0.01 per 10,000 GET and all other requests*
  54. 54. Remaining Questions • Round Robin vs. Load Balancing • Dynamic scaling • Replication speed • Size Benefit (small, medium, large)
  55. 55. $8.9 9
  56. 56. here be dragons

×