Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Supercomputing by API: Connecting Modern Web Apps to HPC

113 views

Published on

Audience Level
Intermediate

Synopsis
The traditional user experience for High Performance Computing (HPC) centers around the command line, and the intricacies of the underlying hardware. At the same time, scientific software is moving towards the cloud, leveraging modern web-based frameworks, allowing rapid iteration, and a renewed focus on portability and reproducibility. This software still has need for the huge scale and specialist capabilities of HPC, but leveraging these resources is hampered by variation in implementation between facilities. Differences in software stack, scheduling systems and authentication all get in the way of developers who would rather focus on the research problem at hand. This presentation reviews efforts to overcome these barriers. We will cover container technologies, frameworks for programmatic HPC access, and RESTful APIs that can deliver this as a hosted solution.

Speaker Bio
Dr. David Perry is Compute Integration Specialist at The University of Melbourne, working to increase research productivity using cloud and HPC. David chairs Australia’s first community-owned wind farm, Hepburn Wind, and is co-founder/CTO of BoomPower, delivering simpler solar and battery purchasing decisions for consumers and NGOs.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Supercomputing by API: Connecting Modern Web Apps to HPC

  1. 1. SUPERCOMPUTINGBYAPI: CONNECTINGWEBAPPSTOHPC Dr. David Perry Compute Integration Specialist, University of Melbourne
  2. 2. Worker Nodes Database Web Server Virtual Laboratory
  3. 3. Worker Nodes Database Web Server Login Node Compute Nodes Virtual Laboratory Supercomputer
  4. 4. YAYSUPERCOMPUTERS!
  5. 5. THEPROBLEM Each HPC cluster has its own: Scheduler Software/OS Hardware
  6. 6. THEDREAM Write once, run anywhere. No platform dependencies. Consistent RESTful API for ... everything.
  7. 7. SOLUTIONS! (sort of)
  8. 8. TODAY: 1. HPC APIs 2. Containers
  9. 9. THEIDEALHPCAPI: Consistent interface across schedulers Manages les Work across system boundaries Doesn't require changes to HPC cluster (no new software, network ports, or security risks) Multiple language bindings/wrappers
  10. 10. DRMAA
  11. 11. import drmaa # Create session s = drmaa.Session() s.initialize() # Create job jt = s.createJobTemplate() jt.remoteCommand = "echo 'hello'" jt.nativeSpecification = "--mincpus=2" jt.hardWallclockTimeLimit = '1:00:00' # Run it jobid = s.runJob(jt) print('Your job has been submitted with ID %s' % jobid) # Wait for it to complete retval = s.wait(jobid, drmaa.Session.TIMEOUT_WAIT_FOREVER) print('Job: {0} finished with status {1}'.format(retval.jobId, retval.hasEx s.exit()
  12. 12. Scheduler A Scheduler B Scheduler C Features supported by DRMAA
  13. 13. Good: Supported by almost all schedulers. Bad: Unfriendly, local access only, limited scheduler feature support, no longer under active development.
  14. 14. SAGA
  15. 15. import saga import os # Run job using SAGA ctx = saga.Context("ssh") ctx.user_id = 'perryd' os.environ['SAGA_PTY_SSH_TIMEOUT'] = '60' session = saga.Session() session.add_context(ctx) js = saga.job.Service("slurm+ssh://spartan.hpc.unimelb.edu.au/", session=se jd = saga.job.Description() jd.executable = "echo 'hello' > hello.out" jd.wall_time_limit = 5 # minutes # Create and submit job, wait for it to finish. myjob = js.create_job(jd) myjob.run() print 'Job Running' myjob.wait() print('Job %s finished with status %s' % (myjob.id, myjob.exit_code))
  16. 16. # Fetch output files output = 'file://localhost/tmp/' source = 'sftp://spartan.hpc.unimelb.edu.au/home/perryd/hello.out' saga.filesystem.File(source, session=session).copy(output) print('Remote file contents:') print(open('/tmp/hello.out').read())
  17. 17. Good: Supports popular schedulers, works over SSH, nothing to install on cluster, handles le transfers. Bad: Still not a web API.
  18. 18. AGAVE
  19. 19. Via RESTful API: Execution & Storage Systems Monitoring Metadata Permissions History Events
  20. 20. Demo
  21. 21. Agave ToGo https://togo.agaveapi.co
  22. 22. Good: Hosted. RESTful, OpenAPI-compliant. Does everything. Bad: Hosted. RESTful, OpenAPI-compliant. Does everything.
  23. 23. On to containers...
  24. 24. Why?
  25. 25. What versions of Bowtie are available? At Melbourne: At Monash: At NCI: Bowtie2/2.2.5-GCC-4.9.2 Bowtie2/2.2.5-intel-2016.u3 Bowtie2/2.2.9-GCC-4.9.2 Bowtie2/2.2.9-intel-2016.u3 bowtie/1.1.2 bowtie2/2.2.8 bowtie/1.2.0 bowtie2/2.1.0 bowtie2/2.2.5 bowtie2/2.2.9 bowtie2/2.3.1
  26. 26. SINGULARITY Image-based (just a big le with everything in it) Flat network/hardware access Volume mounts similar to Docker
  27. 27. DEMO
  28. 28. 1. Get or create a container. $ sudo singularity create -s 6000 my_container.img $ sudo singularity bootstrap my_container.img ubuntu.def $ sudo singularity shell -w my_container.img my_container.img> # Do stuff in a container
  29. 29. $ sudo singularity create -s 6000 digits_docker.img $ sudo singularity --verbose import digits_docker.img docker://nvidia/digits:latest
  30. 30. 2. Run your container. $ singularity exec -B /tmp:/jobs digits_docker.img bash -c "export DIGITS_JOBS_DIR=/jobs && python -m digits"
  31. 31. As a HPC job: #!/bin/bash #SBATCH --nodes 1 #SBATCH --cpus-per-task=12 #SBATCH --partition gpu #SBATCH --gres=gpu:4 #SBATCH --time 02:00:00 LOGIN_PORT=$(shuf -i 2000-65000 -n 1) DIGITS_PORT=5000 module load Singularity ssh -N -f -R $LOGIN_PORT:localhost:$DIGITS_PORT $SLURM_SUBMIT_HOST echo "Forwarding to port:" echo $LOGIN_PORT singularity exec -B /tmp:/jobs -B /tmp:/scratch digits_docker.img bash -c
  32. 32. CAVEATS Hardware/architecture dependencies still there. Beware the golden image.
  33. 33. CONCLUSION Supercomputer-enable your web app! But can't ignore details of each supercomputer. Tools out there to make life a bit easier.
  34. 34. MOREEXPLORATION Project looking at: APIs (inc. local Agave deployment) Virtual Laboratory to HPC Single Sign-on Knowledge Sharing
  35. 35. ACKNOWLEDGEMENTS Nectar VL managers & developers Authors of SAGA, DRMAA and Agave Lev Lafayette & Daniel Tosello

×