Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Mitglied der Helmholtz-Gemeinschaft ISSGC’09 7th International Summer School on Grid Computing UNICORE day at ISSGC’09 Presenters: Rebecca Breu, Bastian Demuth, Mathilde Romberg Jülich Supercomputing Centre (JSC) 6. Juli 2009 7 July 2009
  2. 2. ISSGC’09 Agenda 9:00 – 10:30 Principles of Job Submission and Execution Management Set the scene 11:00 – 12:30 UNICORE – Architecture and Components Technical overview on how UNICORE works and how it is used 14:00 – 15:30 UNICORE Basic Practical Practical: submitting jobs with the command line client 16:00 – 17:30 UNICORE Workflow Practical Practical: submitting workflows with the graphical client 18:00 – 19:00 UNICORE: An Application Example applications using UNICORE 07/07/2009 Slide 2
  3. 3. Mitglied der Helmholtz-Gemeinschaft ISSGC’09 7th International Summer School on Grid Computing Session 9: Principles of Job Submission and Execution Management Author/Presenter: Achim Streit, Mathilde Romberg Jülich Supercomputing Centre (JSC) 6. Juli 2009
  4. 4. ISSGC’09 Job Submission 07/07/2009 Slide 4
  5. 5. ISSGC’09 Jobs Job Some work to be executed Requires CPU and memory Possibly accesses additional resources, e.g., storage, devices, services Job scheduling Policy for assigning jobs to resources Courtesy of Prof. Felix Wolf, RWTH Aachen 07/07/2009 Slide 5
  6. 6. ISSGC’09 Resources Compute Memory Central Processing Units Nodes Threads/Tasks Data Size Transfer rate Network Bandwidths ... 07/07/2009 Slide 6
  7. 7. ISSGC’09 How to Differentiate Compute Resources? by number of CPUs Single processor Multi processor Multiprocessor systems can be grouped into Shared memory Equal access time to memory from each processor Distributed memory Each CPU has its own memory and I/O Different address spaces Distributed shared memory Shared address space Access time depends on location of data in memory 07/07/2009 Slide 7
  8. 8. ISSGC’09 Multiprocessor Systems – Examples Jülich SMP (Symmetric (shared-memory) MultiProcessors) IBM Power 4/5/6 node, multi-core chips MPP (Massively Parallel Processor) Jülich Helsinki IBM Blue Gene/P, Cray XT4 NUMA (Non-Uniform Memory Access) Munich SGI Altix Cluster: Barcelona Mare Nostrum, IBM Power4/5/6 system Tera-10, self-built cluster 07/07/2009 Slide 8
  9. 9. ISSGC’09 Job Scheduling 07/07/2009 Slide 9
  10. 10. ISSGC’09 Job Scheduling Policy for assigning jobs to resources Input are Set of jobs with requirements Set of resources Criteria for assignment Fairness Efficiency Minimize response time (interactive users) and turnaround time (batch jobs) Maximize throughput Courtesy of Prof. Felix Wolf, RWTH Aachen 07/07/2009 Slide 10
  11. 11. ISSGC’09 Usage of Multiprocessor Systems Typically the user/job resource demands are greater than the available resources users/jobs compete Typically resource requirements differ from one user (or application) to the other Large/small (in terms of number of processors) Large/small (in terms of amount of memory) Long/short (in terms of duration of resource usage) A form of resource management and job scheduling is required ! How to share the available resources among the competing jobs? When does a job start and which resources are assigned? 07/07/2009 Slide 11
  12. 12. ISSGC’09 Resource Management & Job Scheduling – 1 Time-sharing (or time-slicing) Several jobs share the same resource Jobs are executed quasi-simultaneously Resources are not exclusively assigned to jobs Resource usage of jobs is reduced to short time slices (some clock ticks of the processor) Jobs need more than a single time slice to complete Each job gets the resource assigned in a round-robin fashion New jobs start immediately Execution time takes longer than on a dedicated resource Typically handled by the operating system Examples: SMP machines, your own Linux PC 07/07/2009 Slide 12
  13. 13. ISSGC’09 Resource Management & Job Scheduling – 2 Space-sharing (or space-slicing) Resources are exclusively assigned to a job until it completes Jobs may have to wait for enough free resources until their start Needs a separate resource management system (also known as batch system) and job scheduler Examples: MPP systems, clusters, etc. LoadLeveler, Torque + Maui, PBSPro, OpenCCS, SLURM, … space-sharing based resource management and job scheduling is commonly used on clusters 07/07/2009 and other multiprocessor systems Slide 13
  14. 14. ISSGC’09 Job Submission on Multiprocessor Systems Example – LoadLeveler IBM Tivoli Workload Scheduler LoadLeveler Available for AIX, Linux Basic LoadLeveler commands llsubmit Submit a job llq Show queued and running jobs llcancel <job_id> Delete a queued or running job llstatus Displays status information Job submission via job command file llsubmit <cmdfile> 07/07/2009 Slide 14
  15. 15. ISSGC’09 LoadLeveler cmd_file examples – 1 # @ job_name = BGP-LoadL-Sample-1 IBMcommentGene/P system @ Jülich – JUGENE # @ Blue = "BGP Job by Size" # @ error = $(job_name).$(jobid).out # @ output = $(job_name).$(jobid).out # @ environment = COPY_ALL; # @ wall_clock_limit = 00:20:00 runtime/duration # @ notification = error # @ notify_user = # @ job_type = bluegene # @ bg_size = 32 size of partition # @ queue /usr/local/bin/mpirun -exe `/bin/pwd`/wait_bgp.rts -mode VN -np 48 -verbose 1 -args "-t 1" Executable, only mpirun ! 07/07/2009 Slide 15
  16. 16. ISSGC’09 LoadLeveler cmd_file examples – 2 IBM p690 eServer Cluster 1600 @ Jülich – JUMP #@ job_type = parallel #@ output = out.$(jobid).$(stepid) #@ error = err.$(jobid).$(stepid) #@ wall_clock_limit = 00:15:00 runtime/duration #@ notify_user = #@ node = 2 resource #@ total_tasks = 64 #@ data_limit = 1.5GB requirements #@ queue myprogram executable #@ node: number of nodes for the job #@ total_tasks: number of total tasks in the job 07/07/2009 Slide 16
  17. 17. ISSGC’09 Job Submission on Multiprocessor Systems Example – Torque + Maui Torque is the resource manager Maui is the cluster scheduler Basic Maui commands Submit a new job msub showq Displays detailed and prioritized list of active and idle jobs canceljob Cancels existing jobs showstart Shows estimated start time of idle jobs showstats Shows detailed usage statistics for users, groups, and accounts, the user has access to 07/07/2009 Slide 17
  18. 18. ISSGC’09 Job submission in Maui Via commandline: msub -l nodes=32:ppn=2,pmem=1800mb,walltime=3600 myscript resource list: script file 32 nodes with 2 processors each 1800 MB per task 3600 seconds duration 07/07/2009 Slide 18
  19. 19. ISSGC’09 Lessons Learned Each job submission system is different Different commands for submission, status query, cancellation Different options, scheduling policies, … Even different configurations of the same job submission systems for different multiprocessor systems exist Job requirements are specified differently Command-line parameters for the job submission command Separate job command file Different job requirements exist Nodes and tasks per node, total tasks, … 07/07/2009 Slide 19
  20. 20. ISSGC’09 Job submission and the Grid A higher, meta level with more abstraction is needed to describe the requirements of jobs in a Grid of heterogeneous systems A lot of proprietary solutions exist, each Grid middleware is using its own language, e.g. AJO in UNICORE 5, ClassAds/JDL in Condor, JDL in gLite, RSL in Globus Toolkit, xRSL in ARC/NorduGrid, etc… And there is JSDL 1.0 Open Grid Forum (OGF) standard 07/07/2009 Slide 20
  21. 21. ISSGC’09 JSDL – Introduction JSDL stands for Job Submission Description Language A language for describing the requirements of computational jobs for submission to Grids and other systems Can also be used between middleware systems or for submitting to any Grid middleware ( interoperability) JSDL does not define a submission interface or what the results of a submission look like or how resources are selected 07/07/2009 Slide 21
  22. 22. ISSGC’09 JSDL Document A JSDL document is an XML document, which may contain Generic (job) identification information Application description Resource requirements (main focus is computational jobs) Description of required data files A JSDL document is a template, which can be submitted multiple times and can be used to create multiple job instances No job instance specific attributes can be defined, e.g. start or end time 07/07/2009 Slide 22
  23. 23. ISSGC’09 JSDL – A Hello World Example <?xml version="1.0" encoding="UTF-8"?> <jsdl:JobDefinition xmlns:jsdl=“” xmlns:jsdl-posix=> <jsdl:JobDescription> <jsdl:Application> <jsdl-posix:POSIXApplication> <jsdl-posix:Executable>/bin/echo<jsdl-posix:Executable> <jsdl-posix:Input>/dev/null</jsdl-posix:Input> <jsdl-posix:Output>std.out</jsdl-posix:Output> <jsdl-posix:Argument>hello</jsdl-posix:Argument> <jsdl-posix:Argument>world</jsdl-posix:Argument> </jsdl-posix:POSIXApplication> </jsdl:Application> </jsdl:JobDescription> </jsdl:JobDefinition> 07/07/2009 Slide 23
  24. 24. ISSGC’09 JSDL – Resource Requirements Description Support simple descriptions of resource requirements NOT a comprehensive resource requirements language Can be extended with other elements for richer or more abstract descriptions Main target is compute jobs CPU, memory, file system/disk, operating system requirements Allows some flexibility for aggregate (total) requirements “I want 10 CPUs in total and each resource should have 2 or more” Very basic support for network requirements 07/07/2009 Slide 24
  25. 25. ISSGC’09 JSDL application extensions SPMD (single-program-multiple-data) Extends JDSL 1.0 for parallel applications (MPI, PVM, etc.) Allows to specify number of processors, processors per host, threads per processes along with the application HPC (high performance computing) Extends JSDL 1.0 for HPC applications running as operating system processes (e.g. username, environment, working directory can be specified) 07/07/2009 Slide 25
  26. 26. ISSGC’09 Lessons Learned JSDL is a standardized language to describe jobs to be submitted to Grid resources Not only the job itself (application, arguments, input, output, etc.), but also resource requirements (CPU, memory, etc.) Extensions for specific application domains (parallel programs, HPC applications) exist BUT: JSDL can not directly be submitted to Grid resources, i.e. a resource management and job scheduling system of a cluster or multiprocessor system in a Grid 07/07/2009 Slide 26
  27. 27. ISSGC’09 Execution and Job Management – 1 One of the essential functionalities and components in a Grid middleware Deals with Initiating/submitting, monitoring and managing jobs Handling and staging of all job data Coordinating and scheduling of multi-step jobs Examples: XNJS in UNICORE 6 ( sessions 10-12 today) WS-GRAM in GT4 ( session 19-21 on Thursday) WMS in gLite ( session 24-26 on Friday) ARC Client in NorduGrid/ARC ( session 29-30 on Sat.) 07/07/2009 Slide 27
  28. 28. ISSGC’09 Execution and Job Management – 2 Initiating/submitting, monitoring and managing jobs Translates the Grid job in a specific job (application details, resources, etc.) for the target system Submits the job to the resource management system using its proprietary way of job submission Frequently polls the job status (waiting/queued, running/executing, failed, aborted, paused, finished, etc.) from the resource management system Provides “access” to the job, its status and data during its runtime and after its (successful or unsuccessful) completion If at job submission time the resource management system becomes not available/reachable, the job is cached for a future hand over to it 07/07/2009 Slide 28
  29. 29. ISSGC’09 Execution and Job Management – 3 Handling and staging of all job data, incl. job directory and persistency Creates, manages, destroys the job directory All data submitted with the job as input is stored in the job directory Data is staged in from remote data resources/archives All data generated by the job is preserved and/or staged after the successful completion of the job Coordinating and scheduling of multi-step jobs If a job consists of more than one step (a workflow), the required resources are orchestrated Manages the proper initiation of the workflow execution The execution of the workflow is controlled and monitored 07/07/2009 Slide 29
  30. 30. ISSGC’09 Lessons Learned Execution and job management is needed A meta-layer on top of the Grid resources is needed to provide a uniform way of accessing the Grid and to provide an intuitive, secure and easy to use interface for the user 07/07/2009 Slide 30
  31. 31. ISSGC’09 Introduction to UNICORE (from 30,000 ft) more in sessions 10-12, today 07/07/2009 Slide 31
  32. 32. ISSGC’09 (Short) History Lesson UNiform Interface to COmputing REsources seamless, secure, and intuitive Initial development started in two German projects funded by the German ministry of education and research (BMBF) 08/1997 – 12/1999: UNICORE project Results: well defined security architecture with X.509 certificates, intuitive GUI, central job supervisor based on Codine (predecessor of SGE) from Genias 1/2000 – 12/2002: UNICORE Plus project Results: implementation enhancements (e.g. replacement of Codine by custom NJS), extended job control (workflows), application specific interfaces (plugins) Continuous development since 2002 in several EU projects Open Source community development since Summer 2004 07/07/2009 Slide 32
  33. 33. ISSGC’09 Projects WisNetGrid ETICS2 More than a decade of German and European SmartLM research & development and PRACE infrastructure projects D-MON PHOSPHORUS Chemomentum Any many others, e.g. eDEISA A-WARE OMII-Europe EGEE-II D-Grid IP D-Grid IP 2 CoreGRID NextGRID DEISA DEISA2 VIOLA UniGrids OpenMolGRID GRIDSTART GRIP EUROGRID UNICORE Plus UNICORE 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 07/07/2009 Slide 33
  34. 34. ISSGC’09 – Grid driving HPC Used in DEISA (European Distributed Supercomputing Infrastructure) National German Supercomputing Center NIC Gauss Center for Supercomputing (Alliance of the three German HPC centers) PRACE (European PetaFlop HPC Infrastructure) – starting-up But also in non-HPC-focused infrastructures (i.e. D-Grid) Taking up major requirements from i.e. HPC users HPC user support teams HPC operations teams 07/07/2009 Slide 34
  35. 35. ISSGC’09 – Open source (BSD license) Open developer community on SourceForge Contribution with your own developments easily possible Design principles Standards: OGSA-conform, WS-RF compliant Open, extensible, interoperable End-to-End, seamless, secure and intuitive Security: X.509, proxy and VO support Workflow and application support directly integrated Variety of clients: graphical, commandline, portal, API, etc. Quick and simple installation and configuration Support for many operating and batch systems 100% Java 5 07/07/2009 Slide 35
  36. 36. ISSGC’09 UNICORE in use some examples 07/07/2009 Slide 36
  37. 37. ISSGC’09 Usage in D-Grid Core D-Grid sites committing parts of their existing resources to D-Grid Approx. 700 CPUs Approx. 1 PByte of storage UNICORE is installed and used Additional Sites received extra money from the BMBF for buying compute clusters and data storage Approx. 2000 CPUs Approx. 2 PByte of storage UNICORE (as well as Globus and gLite) is installed as soon LRZ DLR-DFD as systems are in place 07/07/2009 Slide 37
  38. 38. ISSGC’09 Distributed European Infrastructure for Supercomputing Applications Consortium of leading national HPC centers in Europe Deploy and operate a persistent, production quality, distributed, heterogeneous HPC environment UNICORE as Grid Middleware On top of DEISA’s core services: Dedicated network Shared file system Common production environment at all sites Used e.g. for workflow IDRIS – CNRS (Paris, France), FZJ (Jülich, Germany), RZG (Garching, Germany), CINECA (Bologna, Italy), EPCC ( Edinburgh, UK), applications CSC (Helsinki, Finland), SARA (Amsterdam, NL), HLRS (Stuttgart, Germany), BSC (Barcelona, Spain), LRZ (Munich, Germany), ECMWF (Reading, UK) more in session 33, Monday 9:00 – 10:30 07/07/2009 Slide 38
  39. 39. ISSGC’09 Interoperability and Usability of Grid Infrastructures Focus on providing common interfaces and integration of major Grid software infrastructures OGSA-DAI, VOMS, GridSphere, OGSA-BES, OGSA-RUS UNICORE, gLite, Globus Toolkit, CROWN Apply interoperability components in application use-cases 07/07/2009 Slide 39
  40. 40. ISSGC’09 Grid Services based Environment to enable Innovative Research Provide an integrated Grid solution for workflow-centric, complex applications with a focus on data, semantics and knowledge Provide decision support services for risk assessment, toxicity prediction, and drug design End user focus ease of use domain specific tools “hidden Grid” Based on UNICORE 6 more in sessions 12-13, this afternoon 07/07/2009 Slide 40
  41. 41. ISSGC’09 Commercial usage at 07/07/2009 Slide courtesy of Alfred Geiger, T-Systems SfRSlide 43
  42. 42. ISSGC’09 Lessons Learned UNICORE has a strong HPC-background, but is not limited to HPC use cases, it can be used in any Grid UNICORE is OGSA-conform and WS-RF compliant UNICORE is open, extensible and interoperable UNICORE is open source and coded in Java UNICORE is used in EU and national projects, European e-infrastructures, National Grid Initiatives (NGI), commercial environments, etc. Documentation, tutorials, mailing lists, community links, software, source code, and more: 07/07/2009 Slide 44