Your SlideShare is downloading. ×
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Talk at NCRR P41 Director's Meeting

1,551

Published on

Invited Talk given at the NCRR P41 Director's meeting on October 12, 2010

Invited Talk given at the NCRR P41 Director's meeting on October 12, 2010

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,551
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Amazon Web Services A platform for life science research Deepak Singh, Ph.D. Amazon Web Services NCRR P41 PI meeting, October 2010
  • 2. the new reality
  • 3. lots and lots and lots and lots and lots of data
  • 4. lots and lots and lots and lots and lots of people
  • 5. lots and lots and lots and lots and lots of places
  • 6. constant change
  • 7. science in a new reality
  • 8. science in a new reality ^
  • 9. data science in a new reality ^
  • 10. Image: Drew Conway
  • 11. goal
  • 12. optimize the most valuable resource
  • 13. compute, storage, workflows, memory, transmission, algorithms, cost, …
  • 14. people Credit: Pieter Musterd a CC-BY-NC-ND license
  • 15. enter the cloud
  • 16. what is the cloud?
  • 17. infrastructure
  • 18. scalable
  • 19. 3000 CPU’s for one firm’s risk management application 3444JJ' !"#$%&'()'*+,'-./01.2%/' 344'+567/'(.' 8%%9%.:/' 344'JJ' I%:.%/:1=' ;<"&/:1=' A&B:1=' C10"&:1=' C".:1=' E(.:1=' ;"%/:1=' >?,,?,44@' >?,3?,44@' >?,>?,44@' >?,H?,44@' >?,D?,44@' >?,F?,44@' >?,G?,44@'
  • 20. highly available
  • 21. US East Region Availability Availability Zone A Zone B Availability Availability Zone C Zone D
  • 22. durable
  • 23. 99.999999999%
  • 24. dynamic
  • 25. extensible
  • 26. secure
  • 27. a utility
  • 28. on-demand instances reserved instances spot instances
  • 29. infrastructure as code
  • 30. class Instance attr_accessor :aws_hash, :elastic_ip def initialize(hash, elastic_ip = nil) @aws_hash = hash @elastic_ip = elastic_ip end def public_dns @aws_hash[:dns_name] || "" end def friendly_name public_dns.empty? ? status.capitalize : public_dns.split(".")[0] end def id @aws_hash[:aws_instance_id] end end
  • 31. include_recipe "packages" include_recipe "ruby" include_recipe "apache2" if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" end else %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| package pkg do action :upgrade end end end gem_package "passenger" do version node[:passenger][:version] end execute "passenger_module" do command 'echo -en "nnnn" | passenger-install-apache2-module' creates node[:passenger][:module_path] end
  • 32. import boto import boto.emr from boto.emr.step import StreamingStep Connect to Elastic MapReduce from boto.emr.bootstrap_action import BootstrapAction import time # set your aws keys and S3 bucket, e.g. from environment or .boto AWSKEY= SECRETKEY= S3_BUCKET= NUM_INSTANCES = 1 conn = boto.connect_emr(AWSKEY,SECRETKEY) bootstrap_step = BootstrapAction("download.tst", "s3://elasticmapreduce/bootstrap-actions/download.sh",None) Install packages step = StreamingStep(name='Wordcount',                      mapper='s3n://elasticmapreduce/samples/wordcount/wordSplitter.py',                      cache_files = ["s3n://" + S3_BUCKET + "/boto.mod#boto.mod"],                      reducer='aggregate',                      input='s3n://elasticmapreduce/samples/wordcount/input',                      output='s3n://' + S3_BUCKET + '/output/wordcount_output') Set up mappers & jobid = conn.run_jobflow(     name="testbootstrap", reduces     log_uri="s3://" + S3_BUCKET + "/logs",     steps = [step],     bootstrap_actions=[bootstrap_step],     num_instances=NUM_INSTANCES) print "finished spawning job (note: starting still takes time)" state = conn.describe_jobflow(jobid).state print "job state = ", state print "job id = ", jobid while state != u'COMPLETED':     print time.localtime() job state     time.sleep(30)     state = conn.describe_jobflow(jobid).state     print "job state = ", state     print "job id = ", jobid print "final output can be found in s3://" + S3_BUCKET + "/output" + TIMESTAMP print "try: $ s3cmd sync s3://" + S3_BUCKET + "/output" + TIMESTAMP + " ."
  • 33. a data science platform
  • 34. dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise of the data scientist, Beautiful Data
  • 35. accept all data formats
  • 36. evolve APIs
  • 37. beyond the database and the data warehouse
  • 38. move compute to the data
  • 39. data is a royal garden
  • 40. compute is a fungible commodity
  • 41. “I terminate the instance and relaunch it. Thats my error handling” Source: @jtimberman on Twitter
  • 42. the cloud is an architectural and cultural fit for data science
  • 43. amazon web services
  • 44. your data science platform
  • 45. s3://1000genomes
  • 46. http://aws.amazon.com/publicdatasets/
  • 47. Credit: Angel Pizzaro, U. Penn
  • 48. http://usegalaxy.org/cloud
  • 49. mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
  • 50. AWS knows scalable infrastructure
  • 51. you know the science
  • 52. we can make this work together
  • 53. http://aws.amazon.com/education http://aws.amazon.com/publicdatasets
  • 54. deesingh@amazon.com Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood, James Hamilton & Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license

×