Why The Cloud Is A Computational Biologist's Best Friend

199 views
117 views

Published on

Pros and Cons of cloud computing for biocomputation

Published in: Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
199
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Blue=servicesI’ve used
  • Describe I/0
  • Mention cost calculator:http://calculator.s3.amazonaws.com/calc5.html
  • Go to security menu
  • Why The Cloud Is A Computational Biologist's Best Friend

    1. 1. Amazon Cloud: A Religious Experience Yannick Pouliot 2/23/2012
    2. 2. Amazon Cloud services in a nutshell: Highly flexible storage and compute power sold on a use basis
    3. 3. Why the Cloud? • Complete flexibility of computing power and storage • Grow or diminish as needed • Arbitrary number of machines • Ridiculously powerful machine made affordable on a short lease basis to address particular task (e.g., 15B ANOVAs) • Unusual architectures (e.g., GPUs)
    4. 4. There Are Many Cloud Providers… … but Amazon is clear leader, IMO
    5. 5. Q: What does working with a Cloud machine feel like? A: It’s not materially different than accessing a machine on our cluster, except you can do anything you want
    6. 6. Main Services Provided by Amazon Cloud • Storage ▫ Traditional disk volumes ▫ S3 buckets (“Simple Storage System”) • Computing (EC2 – “Elastic Compute Cloud”) ▫ Single machine instances ▫ Clusters of various types • Machine types ▫ ▫ ▫ ▫ ▫ Compute servers Database servers Cluster Specialized architectures Variety of operating systems (LINUX flavors, Windows)
    7. 7. Types of Instances • Based on definition of the virtual machine definition ▫ ▫ ▫ ▫ I/O bus Number of CPUs Memory Type of CPU, cluster • Deployment: Spot market vs. “Reserved”
    8. 8. Costs • You pay for (almost) everything you do ▫ Data transfers (out) ▫ Storage ▫ CPU cycles (depends on instance type; one instance is free) • Can purchase cycles at below average market price ▫ Can provide access to vast amounts of computing power at a price you can afford • Research grants from Amazon
    9. 9. Controlling Your Services • Web-base console • Command-line tools ▫ EC2 API tools • Third party systems: RightScale
    10. 10. Using & Distributing Instances • You can always make images of your instances for later use/backup • Images can be made public • You can launch other people’s images (i.e., public images), e.g., ▫ CloudBioLinux: pre-made biocomputational instances ▫ Galaxy Cloud: pre-made Cluster-based Galaxy instance (Web-based, no less) ▫ PathSeq: pre-made comprehensive bowtie engine that uses Hadoop
    11. 11. Issues • Security ▫ Lots of it • Data transfers ▫ Free for upload; $ for download ▫ No big deal, so far ▫ Can send drives… • Latency ▫ No big deal • Small “ephemeral” storage ▫ Gotcha if you don’t know • Max 1 terabyte per disk ▫ Hum… • “Max” 20 disks per instance ▫ Can be circumvented • No sharing of disks between instances, usually
    12. 12. Support • Unless you purchase support, you’re on your own • Hasn’t been an issue for me, though it can consume time to find solution… Support options:
    13. 13. Questions?

    ×