David Kelly SWIFT


Published on

Published in: Data & Analytics, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • The Swift parallel scripting language is a high-level language to express the coordinated and parallel execution of application programs and scripts written in any other scripting or programming language.It was developed in 2006, and is a successor to less flexible workflow languages from which it was inspired, which started around 2001.It’s a minimal language, meant to knit together tools for other languages. This need is pervasive across scientific disciplines, and we work to apply Swift broadly.But its particularly relevant tp Earth Systems sciences, where simulations and instruments typically present their outputs are large datasets of many files.Regarding parallism: the user application tools which a Swift script executes may themselves be parallel programs that use many cores or many nodes on a cluster or supercomputer.Regarding location: express the flow of your computations, and keep the plumbing details and the machine-specific details out of this dataflow description. That gives flexibility, and lets optimizing tools like Swift implement the mechanical but complex aspects of portable distributed execution.
  • David Kelly SWIFT

    1. 1. Swift: A Scientist’s Gateway to Campus Clusters, Grids and Supercomputers Swift project: www.swiftlang.org Presenter contact: davidkelly@uchicago.edu swift-support@ci.uchicago.edu David Kelly Computation Institute, University of Chicago and Argonne National Laboratory
    2. 2.  Parallel scripting language for clusters, clouds & grids – For writing loosely-coupled scripts of application programs and utilities linked by exchanging files – Can call scripts in shell, python, R, Octave, MATLAB, …  Swift does 3 important things for you: – Makes parallelism transparent – with functional dataflow – Makes basic failure recovery transparent – Makes computing location transparent – can run your script on multiple distributed sites and diverse computing resources (from desktop to petascale) 2 www.swiftlang.org
    3. 3. MODIS script excerpt  a 3 www.swiftlang.org $ cat modis.swift type file; app (file output) getLandUse (file input) { getLandUse @input; } file getLandUse_inputs[] <filesys_mapper; pattern=”images/*.rgb">; foreach i in getLandUse_inputs { file output <single_file_mapper; file=@strcat(i, “.output”)>; output = getLandUse(i); }
    4. 4. Same script runs on broad range of resources; separate throttles can be set for each site. User can run on multiple resources: 4 www.swiftlang.org Cluster interactive node swift Output Logs Cluster file server Input Swift script Applications config files UChicago UC3 UC3 seeder nodes Midwest T2 nodes OSG VO nodes Beagle Cray Cray XE 24-core nodes Uchicago Midway Sandybridge nodes Westmere nodes Department nodes
    5. 5. Swift’s location-independent scripting lets the user focus on science  Example of running 3,000 jobs to 3 hosts including the UC3 campus collective:  The user started on a basic login host processing 10 files and moved up to a 3,000 file dataset, changing only the dataset name and a site-specification list to get to the resources above  Expanded the scope of their computations from one node to hundreds or thousands of cores  User didn’t need to look at what sites were busy, or adjust arcane scripts, to get to these resources. 5 www.swiftlang.org Midway 289 Beagle 1070 UC3 1641 Total 3000
    6. 6.  Swift is a parallel scripting system for grids, clouds and clusters – for loosely-coupled applications - application and utility programs linked by exchanging files  Swift is easy to write: simple high-level C-like functional language – Small Swift scripts can do large-scale work  Swift is easy to run: contains all services for running Grid workflow - in one Java application – Untar and run – acts as a self-contained Grid client  Swift is fast: uses efficient, scalable and flexible distributed execution engine. – Scales to range of 1M tasks per script run  Swift usage is growing: – applications in earth systems sciences, neuroscience, proteomics, molecular dynamics, biochemistry, economics, statistics, and more.  Try Swift: www.swiftlang.org 6 www.swiftlang.org
    7. 7. 7 Parallel Computing, Sep 2011 www.swiftlang.org
    8. 8. Face-IT: Framework to Advance Climate, Economic, and Impact Investigations with Information Technology 8 www.ci.uchicago.edu/swift www.mcs.anl.gov/exm • Collaboration between University of Chicago, Columbia University, Purdue University, and the University of Florida • Galaxy – scientific workflow system • Combine the best parts of Swift and Galaxy, and apply to Earth Systems
    9. 9. Galaxy Screenshots 9 www.swiftlang.org
    10. 10. Galaxy Screenshots 10 www.swiftlang.org