CloudMan workshop

431 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
431
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

CloudMan workshop

  1. 1. Setting up a Cluster-on-the-Cloud
  2. 2. How to use the cloud? 1.  Get an account on the supported cloud 2.  Start a master instance via the cloud web console or CloudLaunch 3.  Use CloudMan s web interface on the master instance to manage the platform
  3. 3. •  Launch an instance •  Demonstrate the following CloudMan features and prepare for the data analysis part: Manual & Auto-scaling Using an S3 bucket as a data source Accessing an instance over ssh Customizing an instance Submitting an SGE job Sharing-an-instance •  •  •  •  •  •  Interaction flow Agenda details
  4. 4. YOUR TURN
  5. 5. Launch an instance 1.  Load biocloudcentral.org 2.  Enter the access key and secret key provided at http://bit.ly/bioworkshop 3.  Type any name as the cluster name 4.  Set any password (and remember it) 5.  Keep Large instance type 6.  Start your instance Wait for the instance to start (~2-3 minutes) For more details, see wiki. galaxyproject.org/CloudMan
  6. 6. Scaling computation YES YES NO
  7. 7. •  Launch an instance ✓ •  Demonstrate the following CloudMan features and prepare for the data analysis part: Manual & Auto-scaling Using an S3 bucket as a data source Accessing an instance over ssh Customizing an instance Submitting an SGE job Sharing-an-instance •  •  •  •  •  •  Interaction flow Workshop plan
  8. 8. Manual scaling •  Explicitly add 1 worker node to your cluster •  Node type corresponds to node processing capacity •  Research use of Spot instances
  9. 9. Auto-scaling
  10. 10. Auto-scaling in action Fixed cluster size 5 nodes 20 nodes Computation time: 9 hrs Computation cost: $20 Computation time: 6 hrs Computation cost: $50 Dynamic cluster size 1 to 16 nodes Computation time: 6 hrs Computation cost: $20
  11. 11. Using an S3 bucket as a data source
  12. 12. COMMAND LINE ACCESS
  13. 13. Accessing an instance over ssh Use the terminal (or install Secure Shell for Chrome) SSH using user ubuntu and the password you chose when launching an instance: [local machine]$ ssh ubuntu@<instance IP address>
  14. 14. Once logged in •  You have full system access to your instance, including sudo; use it as any other system •  A suite of bioinformatics tools readily installed •  Can submit jobs to a job manager via the standard qsub command
  15. 15. Distributed Job Manager (DRM) •  Controls job execution on a (set of) resource(s) •  A job: an invocation of a program •  Manages job load •  Provides ability to monitor system and job status •  Popular DRMs: Sun Grid Engine (SGE), Portable Batch Scheduler (PBS), TORQUE, Condor, Load Sharing Facility (LSF)
  16. 16. Customize your instance install a new tool $ cd /mnt/galaxy/export $ wget http://heanet.dl.sourceforge.net/project/ dnaclust/parallel_release_3/dnaclust_linux_release3.zip $ unzip dnaclust_linux_release3.zip $ cd dnaclust_linux_release3 $ chmod +x * Get a copy of a sample dataset $ cp /mnt/workshop-data/mtDNA.fasta . Don’t forget
  17. 17. Use the new tool in the cluster mode 1.  Create a new sample shell file to run the tool; call it job_script.sh with the following content: #$ -cwd ./dnaclust -l -s 0.9 /mnt/workshop-data/mtDNA.fasta 2.  Submit single job to SGE queue qsub job_script.sh 3.  Check the queue: qstat -f 4.  Job output will be in the local directory in file job_script.sh.o# 5.  Start a number of instances of the script: qsub job_script.sh (*10) watch qstat ‒f 1.  See all jobs lined up 6.  See auto-scaling in (using /cloud) [1.5-2 mins] 7.  Go back to command prmopt, see jobs being distributed
  18. 18. Sharing-an-Instance •  Share the entire CloudMan platform •  Includes all of user data and even the customizations •  Publish a self-contained analysis •  Make a note of the share-string and send it to your neighbor
  19. 19. Troubleshooting 3° /mnt/galaxy[Indices] Instance block storage Snap2 Snap.. FS 2 FS ... 1° 5° 1° Management Console 2° Contextualize image /usr/bin/ec2autorun.log /tmp/cm/cm_boot.log 3° 2° 6°, 8° /mnt/cm/paster.log Start CloudMan Setup services 4° 9° Persistent data repository cm-<hash> S3/Swift 7° 10° CloudMan machine image CloudMan instance 11° CM-w CloudMan MI CM-w CloudMan MI Application(s) (eg, Galaxy) CloudMan MI CM-w ... ...
  20. 20. Time for lunch? Should now have a functional CloudMan instance, configured as Galaxy cluster •  If not, start one over lunch

×