There	
  is	
  no	
  magic,	
  there	
  is	
  only	
  awesome
 Scien&fic	
  compu&ng	
  with	
  Amazon	
  Web	
  Services

...
Via Reavel under a CC-BY-NC-ND license
life science industry
Credit: Bosco Ho
By ~Prescott under a CC-BY-NC license
<1>
the cloud
has_many :definitions
infrastructure as a service
The	
   “ Living	
   a nd	
   Evolving”	
   C loud
                                                            AWS	
  serv...
Scalable
Increase	
  or	
  decrease	
  
 capacity	
  in	
  minutes
    AutomaIon
Scalable
Increase	
  or	
  decrease	
     Cost	
  Effec9ve
 capacity	
  in	
  minutes       Low	
  rate,	
  pay-­‐as-­‐you-...
Scalable
Increase	
  or	
  decrease	
     Cost	
  Effec9ve
 capacity	
  in	
  minutes       Low	
  rate,	
  pay-­‐as-­‐you-...
Scalable
Increase	
  or	
  decrease	
     Cost	
  Effec9ve
 capacity	
  in	
  minutes         Low	
  rate,	
  pay-­‐as-­‐yo...
compute
elastic compute cloud
elastic
3000 CPU’s for one firm’s risk management application
     3444JJ'
!"#$%&'()'*+,'-./01.2%/'




                          ...
programmable
// Run an instance
$EC2 = new AmazonEC2();

$Options = array('KeyName' => "Jeff's Keys",
                 'InstanceType' =...
more later
cost effective
3000 CPU’s for one firm’s risk management application
     3444JJ'
!"#$%&'()'*+,'-./01.2%/'




                          ...
% Utilization




time
Ideal Effective Utilization
% Utilization




                 time
Ideal Effective Utilization
% Utilization




                          Real Utilization

                 time
Ideal Effective Utilization
% Utilization




                          Real Utilization

                 time
on-demand instances
 reserved instances
   spot instances
Amazon EC2 On-Demand price for the same instance is $0.50
Ideal Effective Utilization
% Utilization




                 time
Ideal Effective Utilization
% Utilization




                           Reserved Utilization




                 time
Ideal Effective Utilization
% Utilization




                           Reserved Utilization




                 time
Ideal Effective Utilization
% Utilization




                         On Demand Utilization




                         ...
Ideal Effective Utilization
                Spot Utilization
% Utilization




                                           ...
secure
Customer	
  A



                                                                                     Customer	
  B




  ...
{ "Version": "2008-10-17",
  "Id": "Queue1_Policy_UUID",
  "Statement": {

"Sid":"Queue1_AnonymousAccess_ReceiveMessage_Ti...
Amazon	
  Virtual	
  Private	
  Cloud	
  (VPC)

                                                              Customer’s i...
storage
Amazon S3
highly scalable
highly available
highly durable
Note: Conceptual drawing only. Actual number of nodes & datacenters may vary
T

                                             Node	
  1         Node	
  n

                                             ...
Region

                                             Datacent             Datacent
                                       ...
elastic block store
block device
resizable
boot device
one size does not fit all
Amazon S3                                                            Amazon EC2 + EBS

•    Cost-­‐effecIve	
  blob	
  or	
...
an ecosystem prospers
<2>
infrastructure as code
Source: Chris Dagdigian
• Images:                        • Keypairs:                                  • VPC:
    –   RegisterImage                ...
using libraries
def access_key
                options.services['access-key']
  Access      end

credentials   def secret_key
            ...
class EC2

attr_accessor :ec2, :instance_index, :image_index, :elastic_ip_index,
:volume_index

      def initialize(acces...
class Instance
    attr_accessor :aws_hash, :elastic_ip

      def initialize(hash, elastic_ip = nil)
        @aws_hash = ...
class EC2

         attr_accessor :ec2, :instance_index, :image_index, :elastic_ip_index,
         :volume_index

        ...
class Instance
             attr_accessor :aws_hash, :elastic_ip

               def initialize(hash, elastic_ip = nil)
  ...
configuration management
cfengine


puppet


 chef
chef
dsl
include_recipe "packages"
include_recipe "ruby"
include_recipe "apache2"

if platform?("centos","redhat")
  if dist_only?
...
include_recipe "packages"
Modular   include_recipe "ruby"
          include_recipe "apache2"

          if platform?("cent...
include_recipe "packages"
           include_recipe "ruby"
           include_recipe "apache2"

OS aware   if platform?("c...
include_recipe "packages"
         include_recipe "ruby"
         include_recipe "apache2"

         if platform?("centos"...
include_recipe "packages"
          include_recipe "ruby"
          include_recipe "apache2"

          if platform?("cent...
include_recipe "packages"
          include_recipe "ruby"
          include_recipe "apache2"

          if platform?("cent...
recipes
template "#{node[:apache][:dir]}/mods-available/passenger.conf" do
  cookbook "passenger_apache2"
  source "passenger.conf...
Template   template "#{node[:apache][:dir]}/mods-available/passenger.conf" do
             cookbook "passenger_apache2"
  ...
template "#{node[:apache][:dir]}/mods-available/passenger.conf" do
             cookbook "passenger_apache2"
Cookbook     ...
<3>
architectural lessons
design for failure
“Everything fails, all the time”
                   -- Werner Vogels
“Things will crash. Deal with it”
                        -- Jeff Dean
2-4% of servers
                                will die annually



Source: Jeff Dean, LADIS 2009
1-5% of disk drives
                                 will die every year



Source: Jeff Dean, LADIS 2009
2.3% AFR in population of 13,250
3.3% AFR in population of 22,400
4.2% AFR in population of 246,000
2.3% AFR in population of 13,250
                         3.3% AFR in population of 22,400
                         4.2% A...
human errors
human errors
          ~20% admin issues have unintended consequences




Source: James Hamilton (http://perspectives.mvdi...
assume sw/hw failure
avoid single points of failure
system as a whole is reslient
loose coupling sets you free
loose coupling sets you free
using message queues
Tight	
  Coupling                Controller	
  A        Controller	
  B        Controller	
  C


                         ...
implement elasticity
no assumptions
resilience to reboot
bootstrap
dynamic
multi-layered security
“Web”	
  Security	
  Group:
TCP	
  	
  80	
   0.0.0.0/0
TCP	
  	
  443	
   0.0.0.0/0
TCP	
  	
  22	
   “App”

“App”	
  Sec...
embrace constraints
distributed memory
sharded DBs
hardware failed?

simply throw it away and
 switch to new hardware
  with no additional cost
cache
think parallel
different architectures
multi-threaded, concurrent
          requests
mapreduce
elastic load-balancing
decompose jobs into
   simplest form
leverage many storage
        options
<4>
computing in the cloud
3 modalities
batch processing
“grids”
queues
URL	
  Queue                   Fetch	
  Images     S3




                    Fetch	
  &	
  Store	
           Render	
  
 ...
sudo gem install cloud-crowd

http://wiki.github.com/documentcloud/cloud-crowd
http://www.rightscale.com
data-intensive computing
Amazon Elastic MapReduce


                                      Amazon EC2 Instances
                                    ...
PREANNOUNCE	
  –	
  EXPAND/SHRINK	
  CLUSTERS

                 Use	
  Case:	
  Increase	
  speed	
  of	
  running	
  job	...
Use	
  Case:	
  Agile	
  Data	
  Warehouse	
  Cluster
                   Customize	
  cluster	
  size	
  to	
  support	
  ...
PREANNOUNCE	
  –	
  IntegraIon	
  with	
  Spot	
  Instances


                                                            ...
high performance computing
Low latency
high bandwidth
cluster compute instances
full bisection bandwidth
10gbps
2 * Xeon 5570 (Intel “Nehalem”)
          23 GB RAM
       10 gbps Ethernet
       1690 TB local disk
    HVM-based virtua...
managing compute cycles
http://cyclecomputing.com
http://web.mit.edu/stardev/cluster/
SQS
<5>
AWS + science = win
3.7 million classifications in just over three days
~15 million in less than a month
>2.6 million clicks in 100 hours
Biomarker Warehouse
pre-clinical, clinical, 3rd party data and publications




            Estimated cost: 10 TB warehous...
Protein interactions @ U. Washington




           Simple Python scripts automate the
           management of 1000s of s...
200 instances
                         60000 structures
                             4 hours
http://bioteam.net/aws
HEAVY-ION COLLISIONS

Problem: Quark matter physics conference
imminent but no compute resources handy

Solution: NIMBUS c...
Image: Wikipedia
lots and lots and lots and lots
 and lots and lots of data and
  lots and lots of lots of data
Image	
  via	
  image	
  editor	
  under	
  a	
  CC-­‐BY	
  License
Image: NOAA
scale
 availability
 utilization
   sharing
collaboration
we are data geeks not data center geeks
BLAT @ U. Penn
Map 100 million, 100 base paired end reads
Quad core with 5 GB of RAM would take 16 days




30 high-memory...
BELLE MONTE CARLO




Credit: Tom Fifield
MapReduce for Genomics

                                                            Ben Langmead

   http://bowtie-bio.sou...
platform for science
http://www.cloudbiolinux.com/
http://usegalaxy.org/cloud
http://dnanexus.com
http://www.elasticr.net




            Elastic-R Collaborative Research Environment
http://aws.amazon.com/publicdatasets/
s3://1000genomes
deesingh@amazon.com	
  
                                                                                      Twicer:@mndo...
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Discovery 2015 Workshop
Upcoming SlideShare
Loading in...5
×

Discovery 2015 Workshop

1,857

Published on

Presentation at the Discovery 2015 Workshop on Cloud Computing at Berkeley

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,857
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
45
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Discovery 2015 Workshop

  1. 1. There  is  no  magic,  there  is  only  awesome Scien&fic  compu&ng  with  Amazon  Web  Services Deepak  Singh Business  Development  Manager  -­‐  Amazon  Compute  Services Discovery  2015  Workshop,  July  23  2010
  2. 2. Via Reavel under a CC-BY-NC-ND license
  3. 3. life science industry
  4. 4. Credit: Bosco Ho
  5. 5. By ~Prescott under a CC-BY-NC license
  6. 6. <1>
  7. 7. the cloud
  8. 8. has_many :definitions
  9. 9. infrastructure as a service
  10. 10. The   “ Living   a nd   Evolving”   C loud AWS  services  and  basic  terminology Most  Applica9ons  Need: 1. Compute Your   A pplication 2. Storage Amazon   Amazon   E lastic   3. Messaging RDS MapReduce   J obFlows Payment   :   A mazon   F PS/   D evPay Amazon   S impleDB   D omains 4. Payment Amazon   Cloud Amazon   S QS   Q ueues Auto-­‐ Elastic   Cloud 5. Distribu9on Amazon   S NS   Topics Scaling LB Watch Front Amazon   S 3   6. Scale Objects   a nd   Amazon   EC2   I nstances Buckets 7. Analy9cs (On-­‐Demand,   Reserved,   S pot) EBS Snapshots Volumes Amazon   Virtual   P rivate   C loud Amazon   Worldwide   P hysical   I nfrastructure   (Geographical   Regions,   Availability   Zones,   Edge   L ocations)  
  11. 11. Scalable Increase  or  decrease   capacity  in  minutes AutomaIon
  12. 12. Scalable Increase  or  decrease   Cost  Effec9ve capacity  in  minutes Low  rate,  pay-­‐as-­‐you-­‐go AutomaIon
  13. 13. Scalable Increase  or  decrease   Cost  Effec9ve capacity  in  minutes Low  rate,  pay-­‐as-­‐you-­‐go AutomaIon Reliable Mission  CriIcal   Infrastructure
  14. 14. Scalable Increase  or  decrease   Cost  Effec9ve capacity  in  minutes Low  rate,  pay-­‐as-­‐you-­‐go AutomaIon Reliable Secure Mission  CriIcal   MulIlayer  security  faciliIes Infrastructure
  15. 15. compute
  16. 16. elastic compute cloud
  17. 17. elastic
  18. 18. 3000 CPU’s for one firm’s risk management application 3444JJ' !"#$%&'()'*+,'-./01.2%/' 344'+567/'(.' 8%%9%.:/' 344'JJ' I%:.%/:1=' ;<"&/:1=' A&B:1=' C10"&:1=' C".:1=' E(.:1=' ;"%/:1=' >?,,?,44@' >?,3?,44@' >?,>?,44@' >?,H?,44@' >?,D?,44@' >?,F?,44@' >?,G?,44@'
  19. 19. programmable
  20. 20. // Run an instance $EC2 = new AmazonEC2(); $Options = array('KeyName' => "Jeff's Keys", 'InstanceType' => "m1.small"); $Res = $EC2->run_instances("ami-db7b9db2", 1, 1, $Options);
  21. 21. more later
  22. 22. cost effective
  23. 23. 3000 CPU’s for one firm’s risk management application 3444JJ' !"#$%&'()'*+,'-./01.2%/' 344'+567/'(.' 8%%9%.:/' 344'JJ' I%:.%/:1=' ;<"&/:1=' A&B:1=' C10"&:1=' C".:1=' E(.:1=' ;"%/:1=' >?,,?,44@' >?,3?,44@' >?,>?,44@' >?,H?,44@' >?,D?,44@' >?,F?,44@' >?,G?,44@'
  24. 24. % Utilization time
  25. 25. Ideal Effective Utilization % Utilization time
  26. 26. Ideal Effective Utilization % Utilization Real Utilization time
  27. 27. Ideal Effective Utilization % Utilization Real Utilization time
  28. 28. on-demand instances reserved instances spot instances
  29. 29. Amazon EC2 On-Demand price for the same instance is $0.50
  30. 30. Ideal Effective Utilization % Utilization time
  31. 31. Ideal Effective Utilization % Utilization Reserved Utilization time
  32. 32. Ideal Effective Utilization % Utilization Reserved Utilization time
  33. 33. Ideal Effective Utilization % Utilization On Demand Utilization Reserved Utilization time
  34. 34. Ideal Effective Utilization Spot Utilization % Utilization On Demand Utilization Reserved Utilization time
  35. 35. secure
  36. 36. Customer  A Customer  B Customer  Z • Guest  operaIng  system  doesn’t   have  elevated  privilege  level. • Instances  are  completely   … isolated. • Intrinsic  network  firewall. • No  access  to  raw  devices. • Virtualized  disks,  logically   isolated,  wiped  clean  aRer  use.                            Hypervisor Firewall Physical                                  Interface
  37. 37. { "Version": "2008-10-17", "Id": "Queue1_Policy_UUID", "Statement": { "Sid":"Queue1_AnonymousAccess_ReceiveMessage_TimeLimit" , "Effect": "Allow", "Principal": { "AWS": "*" }, "Action": "SQS:ReceiveMessage", "Resource": "/987654321098/queue1", "Condition" : { "DateGreaterThan" : { "AWS:CurrentTime":"2009-01-31T12:00Z" }, "DateLessThan" : { "AWS:CurrentTime":"2009-01-31T15:00Z" } } } }
  38. 38. Amazon  Virtual  Private  Cloud  (VPC) Customer’s isolated AWS resources Subnets Router VPN Gateway Amazon Web Services Cloud Secure VPN Connection over the Internet Customer’s Network
  39. 39. storage
  40. 40. Amazon S3
  41. 41. highly scalable
  42. 42. highly available
  43. 43. highly durable
  44. 44. Note: Conceptual drawing only. Actual number of nodes & datacenters may vary
  45. 45. T Node  1 Node  n ... Note: Conceptual drawing only. Actual number of nodes & datacenters may vary
  46. 46. Region Datacent Datacent er er Datacent er Node  1 Node  n ... Note: Conceptual drawing only. Actual number of nodes & datacenters may vary
  47. 47. elastic block store
  48. 48. block device
  49. 49. resizable
  50. 50. boot device
  51. 51. one size does not fit all
  52. 52. Amazon S3 Amazon EC2 + EBS • Cost-­‐effecIve  blob  or    large  object  storage • Mul9ple  flavors  of  database  engine • Minimal  rela9onships  between  objects • Complete  control Amazon SimpleDB Amazon RDS • Zero  administra9ve  overhead  (automaIc   • Na9ve  access  to  database  engine handling  of  geo-­‐redundant  replicaIon,  index   • Easy  migra9on  path  (exisIng  code,  tools,   creaIon,  database  tuning) applicaIon  are  compaIble) • AutomaIc  and  elasIc  scaling  of  resources  to   • Key  features  of  a  relaIonal  database,  such  as   meet  request  load joins  or  complex  transac9ons • High  availability  (mulIple  copies  of  data  for   • Managed  experience  (offload  common  DBA   reliability  and  failover) tasks,  lower  total  cost  of  ownership) • Flexibility  (schema-­‐less  data  store)
  53. 53. an ecosystem prospers
  54. 54. <2>
  55. 55. infrastructure as code
  56. 56. Source: Chris Dagdigian
  57. 57. • Images: • Keypairs: • VPC: – RegisterImage – CreateKeyPair – CreateCustomerGateway – DescribeImages – DescribeKeyPairs – DeleteCustomerGateway – DeregisterImage – DeleteKeyPair – DescribeCustomerGateways – ModifyImageAcribute – AssociateDhcpOpIons – DescribeImageAcribute • Security  Groups: – CreateDhcpOpIons – ResetImageAcribute – DeleteDhcpOpIons – CreateSecurityGroup – DescribeDhcpOpIons – DescribeSecurityGroups – CreateSubnet • Instances: – DeleteSecurityGroup – DeleteSubnet – RunInstances – DescribeSubnets – AuthorizeSecurityGroupIngress – DescribeInstances – CreateVpc – TerminateInstances – RevokeSecurityGroupIngress – DeleteVpc – StopInstances – DescribeVpcs – GetConsoleOutput • Block  Storage  Volumes: – CreateVpnConnecIon – RebootInstances – CreateVolume – DeleteVpnConnecIon – CreatePlacementGroup – DescribeVpnConnecIons – DeleteVolume – DescribePlacementGroup – AcachVpnGateway – DescribeVolumes – CreateVpnGateway • IP  Addresses: – AhachVolume – DeleteVpnGateway – AllocateAddress – DetachVolume – DescribeVpnGateways – ReleaseAddress – CreateSnapshot – DetachVpnGateway – AssociateAddress – DescribeSnapshots – DisassociateAddress – DeleteSnapshot – DescribeAddresses  
  58. 58. using libraries
  59. 59. def access_key options.services['access-key'] Access end credentials def secret_key options.services['secret-key'] end
  60. 60. class EC2 attr_accessor :ec2, :instance_index, :image_index, :elastic_ip_index, :volume_index def initialize(access_key, secret_key) @ec2 = RightAws::Ec2.new(access_key, secret_key) @instance_index = {} @image_index = {} @elastic_ip_index = {} @volume_index = {} end end
  61. 61. class Instance attr_accessor :aws_hash, :elastic_ip def initialize(hash, elastic_ip = nil) @aws_hash = hash @elastic_ip = elastic_ip end def public_dns @aws_hash[:dns_name] || "" end def friendly_name public_dns.empty? ? status.capitalize : public_dns.split(".")[0] end def id @aws_hash[:aws_instance_id] end end
  62. 62. class EC2 attr_accessor :ec2, :instance_index, :image_index, :elastic_ip_index, :volume_index def initialize(access_key, secret_key) @ec2 = RightAws::Ec2.new(access_key, secret_key) @instance_index = {} @image_index = {} @elastic_ip_index = {} @volume_index = {} end def instance_index if @instance_index.empty? @ec2.describe_instances.each do |i| # create an Instance object & add to the array Custom @instance_index[i[:aws_instance_id]] = Instance.new(i, get_elastic_ip_for_instance_id(i[:aws_instance_id])) index end end return @instance_index end end
  63. 63. class Instance attr_accessor :aws_hash, :elastic_ip def initialize(hash, elastic_ip = nil) @aws_hash = hash @elastic_ip = elastic_ip end def public_dns @aws_hash[:dns_name] || "" end def friendly_name public_dns.empty? ? status.capitalize : public_dns.split(".")[0] end def id @aws_hash[:aws_instance_id] end def running? Helper status == "running" end end
  64. 64. configuration management
  65. 65. cfengine puppet chef
  66. 66. chef
  67. 67. dsl
  68. 68. include_recipe "packages" include_recipe "ruby" include_recipe "apache2" if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" end else %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| package pkg do action :upgrade end end end gem_package "passenger" do version node[:passenger][:version] end execute "passenger_module" do command 'echo -en "nnnn" | passenger-install-apache2-module' creates node[:passenger][:module_path] end
  69. 69. include_recipe "packages" Modular include_recipe "ruby" include_recipe "apache2" if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" end else %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| package pkg do action :upgrade end end end gem_package "passenger" do version node[:passenger][:version] end execute "passenger_module" do command 'echo -en "nnnn" | passenger-install-apache2-module' creates node[:passenger][:module_path] end
  70. 70. include_recipe "packages" include_recipe "ruby" include_recipe "apache2" OS aware if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" end else %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| package pkg do action :upgrade end end end gem_package "passenger" do version node[:passenger][:version] end execute "passenger_module" do command 'echo -en "nnnn" | passenger-install-apache2-module' creates node[:passenger][:module_path] end
  71. 71. include_recipe "packages" include_recipe "ruby" include_recipe "apache2" if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" end else %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| Ruby package pkg do action :upgrade syntax end end end gem_package "passenger" do version node[:passenger][:version] end execute "passenger_module" do command 'echo -en "nnnn" | passenger-install-apache2-module' creates node[:passenger][:module_path] end
  72. 72. include_recipe "packages" include_recipe "ruby" include_recipe "apache2" if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" end else %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| package pkg do action :upgrade end end end Package gem_package "passenger" do version node[:passenger][:version] aware end execute "passenger_module" do command 'echo -en "nnnn" | passenger-install-apache2-module' creates node[:passenger][:module_path] end
  73. 73. include_recipe "packages" include_recipe "ruby" include_recipe "apache2" if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" end else %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| package pkg do action :upgrade end end end gem_package "passenger" do version node[:passenger][:version] end execute "passenger_module" do Execute command 'echo -en "nnnn" | passenger-install-apache2-module' creates node[:passenger][:module_path] end
  74. 74. recipes
  75. 75. template "#{node[:apache][:dir]}/mods-available/passenger.conf" do cookbook "passenger_apache2" source "passenger.conf.erb" owner "root" group "root" mode 0755 end
  76. 76. Template template "#{node[:apache][:dir]}/mods-available/passenger.conf" do cookbook "passenger_apache2" source "passenger.conf.erb" owner "root" group "root" mode 0755 end
  77. 77. template "#{node[:apache][:dir]}/mods-available/passenger.conf" do cookbook "passenger_apache2" Cookbook source "passenger.conf.erb" owner "root" re-use group "root" mode 0755 end
  78. 78. <3>
  79. 79. architectural lessons
  80. 80. design for failure
  81. 81. “Everything fails, all the time” -- Werner Vogels
  82. 82. “Things will crash. Deal with it” -- Jeff Dean
  83. 83. 2-4% of servers will die annually Source: Jeff Dean, LADIS 2009
  84. 84. 1-5% of disk drives will die every year Source: Jeff Dean, LADIS 2009
  85. 85. 2.3% AFR in population of 13,250 3.3% AFR in population of 22,400 4.2% AFR in population of 246,000
  86. 86. 2.3% AFR in population of 13,250 3.3% AFR in population of 22,400 4.2% AFR in population of 246,000 Source: James Hamilton (http://perspectives.mvdirona.com)
  87. 87. human errors
  88. 88. human errors ~20% admin issues have unintended consequences Source: James Hamilton (http://perspectives.mvdirona.com)
  89. 89. assume sw/hw failure
  90. 90. avoid single points of failure
  91. 91. system as a whole is reslient
  92. 92. loose coupling sets you free
  93. 93. loose coupling sets you free
  94. 94. using message queues
  95. 95. Tight  Coupling Controller  A Controller  B Controller  C Q Q Q Loose  Coupling  using   Controller  A Controller  B Controller  C Queues
  96. 96. implement elasticity
  97. 97. no assumptions
  98. 98. resilience to reboot
  99. 99. bootstrap
  100. 100. dynamic
  101. 101. multi-layered security
  102. 102. “Web”  Security  Group: TCP    80   0.0.0.0/0 TCP    443   0.0.0.0/0 TCP    22   “App” “App”  Security  Group: TCP    8080   “Web” TCP    22   172.154.0.0/16 TCP    22   “App” “DB”  Security  Group: TCP    3306   “App” TCP    3306   163.128.25.32/32 TCP    22   “App”
  103. 103. embrace constraints
  104. 104. distributed memory
  105. 105. sharded DBs
  106. 106. hardware failed? simply throw it away and switch to new hardware with no additional cost
  107. 107. cache
  108. 108. think parallel
  109. 109. different architectures
  110. 110. multi-threaded, concurrent requests
  111. 111. mapreduce
  112. 112. elastic load-balancing
  113. 113. decompose jobs into simplest form
  114. 114. leverage many storage options
  115. 115. <4>
  116. 116. computing in the cloud
  117. 117. 3 modalities
  118. 118. batch processing
  119. 119. “grids”
  120. 120. queues
  121. 121. URL  Queue Fetch  Images S3 Fetch  &  Store   Render   Page Queue Parse   Render   S3 Images  &   S3 Queue Pages Parse  Page Image   Queue Source: Jeff Barr
  122. 122. sudo gem install cloud-crowd http://wiki.github.com/documentcloud/cloud-crowd
  123. 123. http://www.rightscale.com
  124. 124. data-intensive computing
  125. 125. Amazon Elastic MapReduce Amazon EC2 Instances End Deploy Application Hadoop Hadoop Hadoop Elastic Elastic MapReduce MapReduce Hadoop Hadoop Hadoop Notify Web Console, Command line tools Input output dataset results Input  S3   Output  S3   Get Results Input Data bucket bucket Amazon S3
  126. 126. PREANNOUNCE  –  EXPAND/SHRINK  CLUSTERS Use  Case:  Increase  speed  of  running  job  flows Speed  up  job  flow  execuIon  in  response  to  changing  requirements Dynamically  balance  cost  versus  performance  without  restarIng  a  job Job Flow Job Flow Job Flow Allocate Expand to Expand to 4 instances 9 instances 25 instances Time remaining: Time remaining: 14 Hours 7 Hours Time remaining: 3 Hours
  127. 127. Use  Case:  Agile  Data  Warehouse  Cluster Customize  cluster  size  to  support  varying  resource  needs Leverage  flexibility  to  reduce  costs  and  increase  cluster  uIlizaIon Data Warehouse (Batch Processing) Data Warehouse Data Warehouse (Steady State) (Steady State) Allocate Expand to Shrink to 9 instances 25 instances 9 instances
  128. 128. PREANNOUNCE  –  IntegraIon  with  Spot  Instances Cost without Spot: 4 instances *14 hrs * $0.50 = $28 Job Flow Job Flow Cost with Spot: Allocate Expand to 4 instances *7 hrs * $0.50 = $13 + 4 instances 9 instances 5 instances * 7 hrs * $0.25 = $8.75 Total = $21.75 Time remaining: Savings: ~22% 14 Hours 7 Hours Time remaining:
  129. 129. high performance computing
  130. 130. Low latency high bandwidth
  131. 131. cluster compute instances
  132. 132. full bisection bandwidth
  133. 133. 10gbps
  134. 134. 2 * Xeon 5570 (Intel “Nehalem”) 23 GB RAM 10 gbps Ethernet 1690 TB local disk HVM-based virtualization $1.60 / hr
  135. 135. managing compute cycles
  136. 136. http://cyclecomputing.com
  137. 137. http://web.mit.edu/stardev/cluster/
  138. 138. SQS
  139. 139. <5>
  140. 140. AWS + science = win
  141. 141. 3.7 million classifications in just over three days ~15 million in less than a month >2.6 million clicks in 100 hours
  142. 142. Biomarker Warehouse pre-clinical, clinical, 3rd party data and publications Estimated cost: 10 TB warehouse over 3 years
  143. 143. Protein interactions @ U. Washington Simple Python scripts automate the management of 1000s of simultaneous experiments using the EC2 API http://faculty.washington.edu/danielt/ Source: Ed Lazowska
  144. 144. 200 instances 60000 structures 4 hours http://bioteam.net/aws
  145. 145. HEAVY-ION COLLISIONS Problem: Quark matter physics conference imminent but no compute resources handy Solution: NIMBUS context broker allowed researchers to provision 300 nodes and get the simulations done
  146. 146. Image: Wikipedia
  147. 147. lots and lots and lots and lots and lots and lots of data and lots and lots of lots of data
  148. 148. Image  via  image  editor  under  a  CC-­‐BY  License
  149. 149. Image: NOAA
  150. 150. scale availability utilization sharing collaboration
  151. 151. we are data geeks not data center geeks
  152. 152. BLAT @ U. Penn Map 100 million, 100 base paired end reads Quad core with 5 GB of RAM would take 16 days 30 high-memory instances; 32 hours; $195 Source: Angel Pizzaro/John Hogenesch
  153. 153. BELLE MONTE CARLO Credit: Tom Fifield
  154. 154. MapReduce for Genomics Ben Langmead http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
  155. 155. platform for science
  156. 156. http://www.cloudbiolinux.com/
  157. 157. http://usegalaxy.org/cloud
  158. 158. http://dnanexus.com
  159. 159. http://www.elasticr.net Elastic-R Collaborative Research Environment
  160. 160. http://aws.amazon.com/publicdatasets/
  161. 161. s3://1000genomes
  162. 162. deesingh@amazon.com   Twicer:@mndoci slides  at  hcp://slideshare.net/mndoci InspiraIon  and  material  from  Mah  Wood, James  Hamilton  &  Larry  Lessig By Oberazzi under a CC-BY-NC-SA license
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×