• Save
Bio-IT World 2010 - Keynote talk
Upcoming SlideShare
Loading in...5
×
 

Bio-IT World 2010 - Keynote talk

on

  • 4,025 views

My keynote from Bio-IT World 2010

My keynote from Bio-IT World 2010

Statistics

Views

Total Views
4,025
Views on SlideShare
3,972
Embed Views
53

Actions

Likes
4
Downloads
0
Comments
0

6 Embeds 53

http://deepaksingh.net 19
http://www.slideshare.net 15
http://www.linkedin.com 12
http://mndoci.github.com 3
https://www.linkedin.com 3
http://www.lmodules.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Bio-IT World 2010 - Keynote talk Bio-IT World 2010 - Keynote talk Presentation Transcript

  • There  is  no  magic,  there  is  only  awesome Scien&fic  compu&ng  with  Amazon  Web  Services Deepak  Singh Bio-­‐IT  World  2010
  • By ~Prescott under a CC-BY-NC license
  • <1>
  • data
  • Image: Wikipedia
  • Source: http://www.nature.com/news/specials/bigdata/index.html
  • implications
  • data management data processing data sharing
  • <2>
  • !"#$%&'()*+',-./0) 1%$.#'$$)
  • !"#$%&'()*+',-./0) 1%$.#'$$) !"##"$% &'()*"((%
  • massive scale highly available efficient service oriented secure Amazon Infrastructure
  • infrastructure as a service
  • undifferentiated heavy lifting
  • *+,-./-01$23,4567-$89$ :6+;3,$<78$=7>?/@74$ *+,-./-01$23,4567-$89$ :6+;3,A4$BC38+C$<784/074$ !""#$ !""!$ !""%$ !""&$ !""'$ !""($ !"")$ !""D$
  • Amazon S3 Momentum 102 Billion Peak Requests: 40 Billion 14 Billion 2.9 Billion !"#$%%&# !"#$%%'# !"#$%%(# !"#$%%)# Total Number of Objects Stored in Amazon S3
  • Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  • Messaging Amazon Simple Payments On-Demand Parallel Processing Content Delivery Queue Service (SQS) Amazon Flexible Workforce Amazon Elastic Amazon CloudFront Amazon Simple Payments Service Amazon Mechanical MapReduce Notification Service (FPS) Turk (SNS) Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  • Tools Isolated Networks Monitoring Management AWS Toolkit for Eclipse Amazon Virtual Private Amazon CloudWatch AWS Management Console AWS SDK for .NET Cloud Messaging Amazon Simple Payments On-Demand Parallel Processing Content Delivery Queue Service (SQS) Amazon Flexible Workforce Amazon Elastic Amazon CloudFront Amazon Simple Payments Service Amazon Mechanical MapReduce Notification Service (FPS) Turk (SNS) Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  • Your Custom Applications and Services Tools Isolated Networks Monitoring Management AWS Toolkit for Eclipse Amazon Virtual Private Amazon CloudWatch AWS Management Console AWS SDK for .NET Cloud Messaging Amazon Simple Payments On-Demand Parallel Processing Content Delivery Queue Service (SQS) Amazon Flexible Workforce Amazon Elastic Amazon CloudFront Amazon Simple Payments Service Amazon Mechanical MapReduce Notification Service (FPS) Turk (SNS) Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  • scalability
  • > 1PB of data in S3
  • elasticity
  • 3000 CPU’s for one firm’s risk management application 3444JJ' !"#$%&'()'*+,'-./01.2%/' 344'+567/'(.' 8%%9%.:/' 344'JJ' I%:.%/:1=' ;<"&/:1=' A&B:1=' C10"&:1=' C".:1=' E(.:1=' ;"%/:1=' >?,,?,44@' >?,3?,44@' >?,>?,44@' >?,H?,44@' >?,D?,44@' >?,F?,44@' >?,G?,44@'
  • highly available
  • “Everything fails, all the time” -- Werner Vogels
  • 2-4% of servers & 1-5% of disk drives will die annually Source: Jeff Dean, LADIS 2009
  • human errors
  • human errors ~20% admin issues have unintended consequences Source: James Hamilton
  • scalable & available
  • assume sw/hw failure
  • design apps to be resilient
  • automation & alarming
  • Image: Chris Dagdigian
  • US East Region !"#$%&'()*+ T T Availability Availability Zone A Zone B Availability Availability T Zone C Zone D
  • elastic load balancing CloudWatch auto scaling SNS SQS elastic IP elastic block store
  • "#$%&!'()*+#,$! !"#$%"&&'%()( !"#$%"&&'%(*( !"#$%"&&'%(!( + ( +! +! -((./!'()*+#,$! ).#,$!0)/)/.! !"#$%"&&'%()( !"#$%"&&'%(*( !"#$%"&&'%(!(
  • cost effective pay as you go economies of scale choices in pricing
  • on-demand instances reserved instances spot instances
  • on-demand instances !"#$"%$#&'$(&$ %)"*+,($")$-./.0123&'*$ 45%)$56&*)$7"!"78)#$
  • reserved instances !"#$%&$'(")*$'++$ !"#+($"&+(,-).$/"0*$ .%,(,)*++1$/,&,/2*3$
  • spot instances !"#$%&$'&'()#$*+,+*"-.$ ,/"*)$!+()#$%&$(',,0.$+&#$#)1+&#$ .%'$,+.$1+/2)-$,/"*)$ "&(-+&*)($*+&$#"(+,,)+/$
  • http://cloudexchange.com
  • physical is free network is easy rest can be added
  • Customer 1 Customer 2 !" Customer n Hypervisor Virtual Interfaces Customer 1 Security Groups Customer 2 Security Groups ! Customer n Security Groups Firewall Physical Interfaces
  • A MAZON   V PC   A RCHITECTURE Customer’s isolated AWS resources 10 .32 . 2. 0 /24 Subnets 10 10 .32 .32 .1.0 . 3. 0 /24 /24 VPN Gateway Amazon Web Services Cloud Secure VPN Connection over the Internet External Your Network Customers
  • http://aws.amazon.com/security
  • <3>
  • AWS + science
  • data management
  • Biomarker Warehouse pre-clinical, clinical, 3rd party data and publications ;<./5'=>?6@' !)*(%"&&' 23,3415'61789:1' !#%&$(%&&&' +,'-./01' !"#$%"&&' 6178170' 6A.7341' B817-135' Estimated cost: 10 TB warehouse over 3 years
  • data processing
  • http://cyclecomputing.com
  • sudo gem install cloud-crowd http://cyclecomputing.com http://wiki.github.com/documentcloud/cloud-crowd
  • http://www.rightscale.com
  • Amazon Elastic MapReduce Amazon EC2 Instances End Deploy Application Hadoop Hadoop Hadoop Elastic Elastic MapReduce MapReduce Hadoop Hadoop Hadoop Notify Web Console, Command line tools Input output dataset results Input  S3   Output  S3   Get Results Input Data bucket bucket Amazon S3
  • Protein interactions @ U. Washington EC2! Simple Python scripts automate the management of 1000s of simultaneous experiments using the EC2 API http://faculty.washington.edu/danielt/ Source: Ed Lazowska
  • BLAT @ U. Penn Map 100 million, 100 base paired end reads Quad core with 5 GB of RAM would take 16 days :..2#>*120-#D#1+*@01# $%# C*120-# !67>(2#2A0#:;<=#9.7# E,+.*?#(',621# !+*@0# &'()*+#,-./011# 1,+(21#3*12*#4+05# F.G'+.*?#-016+21# !6710860'2#9.71# !+*@0# :;<=#1>*++0-#4+01# !"# *'?#1*@0#0*/A# -016+2#*1#(2#B.01# !+*@0# !+*@0# 30 high-memory instances; 32 hours; $195 Source: Angel Pizzaro/John Hogenesch
  • 200 instances 60000 structures 4 hours http://bioteam.net/aws
  • Crossbow: Rapid whole genome SNP analysis Preprocessed reads Map: Bowtie Sort: Bin and partition Reduce: SoapSNP Langmead B, Schatz MC, Lin, J, Pop M, Salzberg SL. Genome Biol 10(11): R134.
  • data storage & distribution public & private
  • http://aws.amazon.com/publicdatasets/
  • sharing and collaboration
  • http://www.elasticr.net Elastic-R Collaborative Research Environment
  • software/pipeline distribution
  • http://www.cloudbiolinux.com/
  • http://usegalaxy.org/cloud
  • http://clovr.igs.umaryland.edu/
  • platforms & services
  • http://heroku.com
  • http://chempedia.com/
  • http://dnanexus.com
  • NexusDB NexusDB NexusDB Storage Server (S3) SSL NexusDB https Client http://www.biodiscovery.com/index/biod-nexusdb-action
  • Cloud-based biosemantic apps http://syapse.com/
  • <4>
  • 2006 comparison. Large Service vs. Mid Size Source: James Hamilton
  • datacenter design efficiency PUE < 1.2 - 1.5* Source: James Hamilton * average > 2.0 (Source: EPA)
  • multiple datacenters Source: James Hamilton
  • h/w costs efficiency optimization Source: James Hamilton
  • investments in automation investments in s/w investments in special skills
  • server utilization infrastructure as opex pay as you go/grow
  • • Boot from EBS • AWS Multi Factor Authentication • US West Region • Virtual Private Cloud private beta • VPC Unlimited Beta • Lower Reserved Instance Pricing • ELB Support in Console • Reserved Instances in EU • Console Support for CloudWatch • CloudFront streaming • Elastic MapReduce • SQS in EU • EC2 Spot Instances • Windows 2008 Support • RDS Launched • Lowered Prices • New SimpleDB Features •  AWS Security Center • High Memory Instances • AWS Economics Center • Console support for Cloudfront • Reduced EC2 Pricing • FPS General Availability • EMR Apache Hive support • EC2 Reserved Instances • Elastic MapReduce in EU • SAS 70 Type II Audit • EC2 with Windows • AWS SDK for .NET • EC2 in EU • CloudFront Private Content • AWS Toolkit for Eclipse • EBS Shared Snapshots • APAC announced • SimpleDB in EU • Monitoring in EU • AWS Import/Export • Auto Scaling in EU • Lower pricing tiers for Cloudfront • Elastic Load Balancing in EU • Monitoring, Auto Scaling, • AWS Management Console • AWS Solutions Provider program and Elastic Load Balancing • CloudFront adds access logging
  • there is no magic there is only awesome
  • Thank  you! deesingh@amazon.com  Twi;er:@mndoci   Presenta?on  ideas  from  James  Hamilton,  @mza  and  @lessig