Cluster Tutorial 
I. Introduction and prerequisits 
This short manual should give guidance how to set up a proper workenvironment with the three different clusters which we have available in our institute. The first question to ask yourself is the following:”Do I need the cluster for my problem?”. From experience I can tell mostly not, because sometimes the cost to reprogram the solution in a parallel manner is excceding the benefit by far. Therefor please check the following questions, if you answer them with yes, it makes absolutly sense to solve your problem with the cluster. 
 I have a huge a set of data which won’t fit to the memory of a single machine? 
 I have a huge a set of data which won’t fit to the memory of a single machine, which I cannot split in different chunks because I have to jump back and fort within them? 
 I have plenty of iterations to perform for my simulation which I want to put on the cluster because it would take a couple of months otherwise. 
 The routine I wrote for the cluster can be used by my peers for the next ten years and will be used daily. 
It might sound odd for some, but in general one should not underestimate the initial effort to get started. If you are still not deterred, you must make sure that the following conditions are met. 
 The special Matlab is installed on your office computer. 
 HPC Cluster Manager and Job Manager are installed on the very same machine. 
 You have an Active Directory account, aka PHYSIK3HansWurscht. 
If those requirements are not met, please write a ticket to https://opto1:9676/portal describing that you want to participate in the cluster clique and we will set you up within the same moment. 
We have three different clusters available, termed ALPHA, GAMMA and SKYNET. They do all serve different purposes, thus it makes total sense to fit your problem to the specific grid. 
 SKYNET: Is a HPGPU (High performance graphiccard processing unit) which is very experimentally and needs a high degree of expertise. But you can also run regular jobs here, it is not forbidden. It has up to 80 Workers. It has eight M2050 Tesla GPU’s, which are pretty insane. 
 ALPHA: Is a HPC which makes use of the office computers when those are non busy, for example at night or on weekends. Since that cluster can shrink and grow depending on available resources there is no absolute number available, but the maximum is somewhere around 500 workers. 
 GAMMA: Is a HPC with 16 Workers but 32GB Memory, in case one must submitt a job with a huge requirement in terms of memory it is recomendet to use this grid.
II. Connect to the cluster 
Connecting to the cluster is easy as making coffee. Please download the profile from the project server https://projects.gwdg.de/projects/cluster and import them to your local Matlab application Fig.2. Afterwards it is recommended to run the test routines which are checking your configuration. It is very important that they are all marked as passed Fig.4. Where to find the button is shown in Fig.1. It might be that the system is asking for authentication ones, therefore please connect with your regular office computers credentials, the dialog window which appears looks like in Fig.3 
Abbildung 1: Manage Cluster 
Abbildung 2: Import or find Clusters 
Abbildung 3: Connect to cluster with AD credentials 
Press here and click manage Clusters
Abbildung 4: Test Cluster Connectivity 
III. Monitor the jobs 
On the local computer one has a program called job manager, which is used to monitor the cluster resources. If for instance a job hangs up or one wants to chancel, this program is the necessary tool. 
In Fig.5 the typical layout of the job manager is displayed, to chancel you job, right click on it and chancel. To control different clusters, one needs to set the job manager to right cluster headnode, which is shown in Fig. 6. It is very important that you kill your jobs if they hang up, otherwise the other users of the cluster cannot use it at full resource level. 
Abbildung 5:Job Manager 
Abbildung 6: Select Head Node
IV. Programming Tutorial 
The programs which are explained in this tutorial are available for direct use in Matlab Editor, please visit the project server: https://projects.gwdg.de/projects/cluster and download folder example files. Add them to your local Matlab path otherwise the interpreter cannot find them. First you will need to select the parallel configuration; this can be a cluster or your local machine if it consists of several CPU cores. Fig.7. 
Abbildung 7:Select Profile 
For more information on configurations and programming with user configurations, see: 
http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/f5-16141.html#f5-16540
A. Using an Interactive MATLAB pool 
To interactively run your parallel code, you first need to open a MATLAB pool. This reserves a collection of MATLAB worker sessions to run your code. The MATLAB pool can consist of MATLAB sessions running on your local machine or on a remote cluster. In this case, we are initially running on your local machine. You can use matlabpool open to start an interactive worker pool. If the number of workers is not defined, the default number defined in your configuration will be used. A good rule of thumb is to not open more workers then cores available. If the Configuration argument is not provided, matlabpool will use the default configuration as setup in the beginning of this section. When you are finished running with your MATLAB pool, you can close it using matlabpool close. Two of the main parallel constructs that can be run on a MATLAB pool are parfor loops (parallel for-loops) and spmd blocks (single program - multiple data blocks). Both constructs allow for a straight- forward mixture of serial and parallel code. 
parfor loops are used for task-parallel (i.e. embarrassingly parallel) applications. parfor is used to speed up your code. Below is a simple for loop converted into a parfor to run in parallel, with different iterations of the loop running on different workers. The code outside the parfor loop executes as traditional MATLAB code (serially, in your client MATLAB session). 
Different workers. The code outside the parfor loop executes as traditional MATLAB code (serially, in your client MATLAB session). 
Note: The example below is located in the m-file, ‘parforExample1.m’. 
matlabpool open 2 % can adjust according to your resources 
N = 100; 
M = 200; 
a = zeros(N,1); 
tic; % serial (regular) for-loop for i = 1:N 
a(i) = a(i) + max(eig(rand(M))); 
end toc; 
tic; % parallel for-loop parfor i = 1:N 
a(i) = a(i) + max(eig(rand(M))); 
end toc; 
matlabpool close 
spmd blocks are a single program multiple data (SPMD) language construct. The "single program" aspect of spmd means that the identical code runs on multiple labs. The code within the spmd body executes simultaneously on the MATLAB workers. The "multiple data" aspect means that even though the spmd statement runs identical code on all workers, each worker can have different, unique data for that code. spmd blocks are useful when dealing with large data that cannot fit on a single machine. Unlike parfor, spmd blocks support inter-worker communication. They allow: 
 Arrays (and operations on them) to be distributed across multiple workers 
 Messages to be explicitly passed amongst workers.
The example below creates a distributed array (different parts of the array are located on different workers) and computes the svd of this distributed array. The spmd block returns the data in the form of a composite object (behaves similarly to cells in serial MATLAB. For specifics, see the documentation link below). 
Note: The example below is located in the m-file, ‘spmdExample1.m’. 
matlabpool open 2 % can adjust according to your resources 
M = 200; 
spmd 
N = rand(M,M,codistributor); % 200x100 chunk per worker 
A = svd(N); 
end 
A = max(A{1}); % Indexing into the composite object 
disp(A) 
clear N 
matlabpool close 
For information on matlabpool, see: 
http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/ matlabpool.html 
For information about getting started using parfor loops, see: 
http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/ brb2x2l-1.html 
For information about getting started using spmd blocks, see: http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/ brukbno-2.html 
For information regarding composite objects: 
http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/ brukctb-1.html 
For information regarding distributed arrays: 
http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/bqi9fln-1.html
1. Using Batch to Submit Serial Code – Best Practice for Scripts 
batch sends your serial script to run on one worker in your cluster. All of the variables in 
your client workspace (e.g. the MATLAB process you are submitting from) are sent to the 
worker by default. You can alternatively 
pass a subset of these variables by 
defining the Workspace argument and 
passing the desired variables in a 
structure. After your job has finished, you 
can use the load command to retrieve 
the results from the worker-workspace 
back into your client-workspace. In this and all examples following, we use a wait to ensure 
the job is done before we load back in worker-workspace. This is optional, but you can not 
load the data from a task or job until that task or job is finished. So, we use wait to block the 
MATLAB command line until that occurs. If the Configuration argument is not provided, 
batch will use the default configuration that was set up above. 
Note: For this example to work, you will need ‘testBatch.m’ on the machine that you are 
submitting from (i.e. the client machine). This example below is located in the m-file, 
‘submitJob2a.m’. 
%% This script submits a serial script using batch 
job2a = batch('testBatch'); 
wait(job2a); % only can load when job is finished 
sprintf('Finished Running Job') 
load(job2a); % loads all variables back 
sprintf('Loaded Variables into Workspace') 
% load(job2a, 'A'); % only loads variable A 
destroy(job2a) % permanently removes job data 
sprintf('Test Completed') 
If you have submitted successfully, you should see the following variables appear in your 
client workspace: 
Abbildung 9: Workspace 
For more information on batch, see: 
http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distc 
omp/batch.html 
and here: 
Abbildung 8: Batch Job
http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/brjw1e5-1.html#brjw1fx-3 
2. Using Batch to Submit Scripts that Run Using a MATLAB pool 
batch with the 'matlabpool' option sends scripts containing parfor or spmd to run on workers via a MATLAB pool. In this process, one worker behaves like a MATLAB client process that facilitates the distribution of the job amongst the workers in the pool and runs the serial portion of the script. Therefore, specifying a 'matlabpool' of size N actually will result in N+1 workers being used. Just like in step 2a, all variables are automatically sent from your client workspace (i.e. the workspace of the MATLAB you are submitting from) to the worker’s workspace on the cluster. load then brings the results from your worker’s workspace back into your client’s workspace. If a configuration is not specified, batch uses the default configuration as defined in the beginning of this section. 
Note: For this example to work, you will need ‘testParforBatch.m’ on the machine that you are submitting from (i.e. the client machine). This example below is located in the m-file, submitJob2b.m. 
%% This script submits a parfor script using batch 
job2b = batch('testParforBatch','matlabpool',2); 
wait(job2b); % only can load when job is finished sprintf('Finished Running Job') 
load(job2b); % loads all variables back 
sprintf('Loaded Variables into Workspace') 
% load(job2b, 'A'); % only loads variable A 
destroy(job2b) % permanently removes job data 
sprintf('Test Completed') 
If you have submitted successfully, you should see the following variables appear in your client workspace: 
Abbildung 10: Workspace Batch Pool
The above code submitted a script containing a parfor. You can submit a script containing a 
spmd block in the same fashion by changing the name of the submission script in the batch 
command. Note: For this example to work, you will need ‘testSpmdBatch.m’ on the machine 
that you are submitting from (i.e. the client machine). This example below is located in the m-file, 
submitJob2b_spmd.m. 
%% This script submits a spmd script using batch 
job2b = batch('testSpmdBatch','matlabpool',2); 
wait(job2b); % only can load when job is finished 
sprintf('Finished Running Job') 
load(job2b); % loads all variables back 
sprintf('Loaded Variables into Workspace') 
% load(job2b, 'A'); % only loads variable A 
destroy(job2b) % permanently removes job data 
sprintf('Test Completed') 
If you have submitted successfully, you should see the following variables appear in your 
client workspace: 
Abbildung 11: Batch Pool SPMD
B. Run Task-Parallel Example with Jobs and Tasks 
In this example, we are sending a task parallel job with multiple tasks. Each task evaluates the built-in MATLAB function. The createTask function in the below example is passed the job, the function to be run in the form of a function handle (@sum), the number of output arguments of the function (1), and the input argument to the sum function in the form of a cell array ({[1 1]}); 
If not given a configuration, findResource uses the scheduler found in the default configuration defined in the beginning of this section. 
Note: This example is located in the m-file, ‘submitJob3a.m’. 
%% This script submits a job with 3 tasks 
sched = findResource(); 
job3a = createJob(sched); 
createTask(job3a, @sum, 1, {[1 1]}); 
createTask(job3a, @sum, 1, {[2 2]}); createTask(job3a, @sum, 1, {[3 3]}); submit(job3a) 
waitForState(job3a, 'finished') %optional 
sprintf('Finished Running Job') 
results = getAllOutputArguments(job3a); 
sprintf('Got Output Arguments') 
destroy(job3a) % permanently removes job data 
sprintf('Test Completed') 
If you have submitted successfully, you should see the following variables appear in your client workspace: 
Abbildung 12: Parallel Task 
results should contain the following: 
Abbildung 13: Terminal Output Task Parallel
You can also call a user-created function in the same way as shown above. In that case, you will need to make sure that any scripts, files, or functions that the task function uses are accessible to the cluster. You can do this by sending those files to the cluster via the FileDependencies property or by directing the worker to a shared directory containing those files via the PathDependencies property. An example of using FileDependencies is shown below: Note: you will need to have a ‘testTask.m’ file on the machine you are submitting from for this example to work. This example is located in the m- file, ‘submitJob3b.m’. 
% This script submits a job with 3 tasks 
sched = findResource(); 
job3b = createJob(sched,'FileDependencies',{'testTask.m'}); 
createTask(job3b, @testTask, 1, {1,1}); 
createTask(job3b, @testTask, 1, {2,2}); createTask(job3b, @testTask, 1, {3,3}); submit(job3b) 
waitForState(job3b, 'finished') % optional sprintf('Finished Running Job') 
results = getAllOutputArguments(job3b); 
sprintf('Got Output Arguments') 
destroy(job3b) % permanently removes job data 
sprintf('Test Completed') 
If you have submitted successfully, you should see the following variables appear in your client workspace: 
Abbildung 14: Task Parallel Workspace
results should contain the following: 
Abbildung 15: Task Parallel Output 
For more information on File and Path Dependencies, see the below documentation. 
File Dependencies: http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/filedependencies.html 
Path Dependencies: 
http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/ pathdependencies.html 
More general overview about sharing code between client and workers: 
http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/bqur7ev-2.html#bqur7ev-9
C. Run Task-Parallel Example with a MATLAB pool job - Best 
Practice for parfor or spmd in functions 
In this example, we are sending a MATLAB pool job with a single task. This is nearly equivalent 
to sending a batch job (see step 2b) with a parfor or a spmd block, except this method is 
best used when sending functions and not scripts. It behaves just like jobs/tasks 
explained in step 3. The function referenced in the task contains a parfor. 
Note: For this example to work, you will need ‘testParforJob.m’ on the machine that you are 
submitting from (i.e. the client machine). This example is located in the m-file, 
‘submitJob4.m’. 
% This script submits a function that contains parfor 
sched = findResource(); 
job4 = createMatlabPoolJob(sched,'FileDependencies',... 
{'testParforJob.m'}); 
createTask(job4, @testParforJob, 1, {}); 
set(job4, 'MaximumNumberOfWorkers', 3); 
set(job4, 'MinimumNumberOfWorkers', 3); 
submit(job4) 
waitForState(job4, 'finished') % optional 
sprintf('Finished Running Job') 
results = getAllOutputArguments(job4); 
sprintf('Got Output Arguments') 
destroy(job4) % permanently removes job data 
sprintf('Test Completed') 
If you have submitted successfully, you should see the following variables appear in your 
client workspace: 
results{1} should contain a [50x1 double]. 
For more information on creating and submitting MATLAB pool jobs, see 
http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distc 
omp/creatematlabpooljob.html 
Abbildung 16: WOrkspace Variables SPMD in Functions
D. Run Data-Parallel Example 
In this step, we are sending a data parallel job with a single task. The format is similar to that of jobs/tasks (see step 3). For parallel jobs, you only have one task. That task refers to a function that uses distributed arrays, labindex, or some mpi functionality. In this case, we are running a simple built in function (labindex) which takes no inputs and returns a single output. labindex returns the ID value for each of worker processes that ran the it . The value of labindex spans from 1 to n, where n is the number of labs running the current job Note: This example is located in the m-file, ‘submitJob5.m’. 
%% Script submits a data parallel job, with one task 
sched = findResource(); 
job5 = createParallelJob(sched); 
createTask(job5, @labindex, 1, {}); 
set(job5, 'MaximumNumberOfWorkers', 3); 
set(job5, 'MinimumNumberOfWorkers', 3); 
submit(job5) 
waitForState(job5, 'finished') % optional 
sprintf('Finished Running Job') 
results = getAllOutputArguments(job5); 
sprintf('Got Output Arguments') 
destroy(job5); % permanently removes job data 
sprintf('Test Completed') 
If you have submitted successfully, you should see the following variables appear in your client workspace: 
Abbildung 17: Workspace Data Parallel
results should contain the following: 
Abbildung 18: Results Data Parallel 
For more information on creating and submitting data parallel jobs, see: 
http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/createparalleljob.html 
For more information on, labindex, see: 
http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/labindex.html 
E. Node GPU Processing 
If one needs to accelerate the execution even further a good strategy is to use the GPU CUDA building functions of Matlab distributed toolbox. Set the cluster profile to SKYNET and open a pool of workers. 
Matlabpool open … 
The Scheduler of located on the headnode of the cluster will recognize that your job has GPU code on board and will switch the scheduling profile automatically to dispatch only to the nodes which have a GPU integrated. In any case it is a good idea to catch the error if the scheduler will not work properly. See the next block of source code to see an excellent example how to do that.
function testGPUInParfor() 
spmd 
selectGPUDeviceForLab(); 
end 
parfor i = 1:1000 
% Each iteration will generate some data A 
A = rand(5555); 
if selectGPUDeviceForLab() 
A = gpuArray(A); 
disp( 'Do it on the GPU' ) 
else 
disp( 'Do it on the host' ) 
end 
% replace the following line with whatever task you need to do 
S = sum(A,1); 
% Maybe collect back from GPU (gather is a no-op if not on the GPU) 
S = gather(S); 
end 
function ok = selectGPUDeviceForLab() 
persistent hasGPU; 
if isempty( hasGPU ) 
devIdx = mod(labindex-1,gpuDeviceCount())+1; 
try 
dev = gpuDevice( devIdx ); 
hasGPU = dev.DeviceSupported; 
catch %#ok 
hasGPU = false; 
end 
end 
ok = hasGPU; 
F. Avoid Errors – Use the Compiler To Your Advantage 
It is counterintuitive that in a parallel loop or parallel data block, one single iteration of that particular block is running on a parallel task, aka a worker. Therefor the iterations must be iterations save, since there is no guaranty that the iterations are running in an ascending order. What I prefer to do is borrow the map reduce approach, of course we don’t reduce anything here but the design pattern is great to prevent you some headache. In the reduce step, a piece of data and function is scheduled to a worker, there the reduce function does produce a return. (In real map reduce one can now proced and use the output of step(x-1) in step(x), which we normally don’t.) I can only advice, make extensively use of function returns. The idea is that if a chunk of data is distributed via a dimension or a chunk of iterations are distributed over the sum of iterations that every slice goes via a function call to the worker and comes back via return to the parallel block. Besides it has advantages for the underlying MPI which would shovel all variables via multicast to all workers instead only to one worker, but this is above the scope of this document.
In addition it is very clever to write your function in a way that it can run on both, local and cluster environment. For reference see the example folder “sofi” on the project server. 
Problem: 
Parfor 
u=u*v; 
end 
Better: 
Parfor 
u=multi(u,v); 
end 
G. Summary Chart for Scheduling Options 
Abbildung 19: Scheduler Options
V. Glossary: 
HPGPU: High performance graphiccard processing unit 
Worker: A worker is a parallel task. In other words, if one as 100 workers available, one can run 100 iterations of a parallel loop in one time intervall (tick) of the processor. 
Node: A node is a physical machine, for example a computer connected to a cluster is a node of that very cluster. This node can have 16 Workers, if the node has four CPU’s with for cores on each CPU. 
MPI: Message Passing Interface is a fancy piece of software which distributes processes around in a grid via RPC’s. 
RPC: Remote Proceture Calls

HPC and HPGPU Cluster Tutorial

  • 1.
    Cluster Tutorial I.Introduction and prerequisits This short manual should give guidance how to set up a proper workenvironment with the three different clusters which we have available in our institute. The first question to ask yourself is the following:”Do I need the cluster for my problem?”. From experience I can tell mostly not, because sometimes the cost to reprogram the solution in a parallel manner is excceding the benefit by far. Therefor please check the following questions, if you answer them with yes, it makes absolutly sense to solve your problem with the cluster.  I have a huge a set of data which won’t fit to the memory of a single machine?  I have a huge a set of data which won’t fit to the memory of a single machine, which I cannot split in different chunks because I have to jump back and fort within them?  I have plenty of iterations to perform for my simulation which I want to put on the cluster because it would take a couple of months otherwise.  The routine I wrote for the cluster can be used by my peers for the next ten years and will be used daily. It might sound odd for some, but in general one should not underestimate the initial effort to get started. If you are still not deterred, you must make sure that the following conditions are met.  The special Matlab is installed on your office computer.  HPC Cluster Manager and Job Manager are installed on the very same machine.  You have an Active Directory account, aka PHYSIK3HansWurscht. If those requirements are not met, please write a ticket to https://opto1:9676/portal describing that you want to participate in the cluster clique and we will set you up within the same moment. We have three different clusters available, termed ALPHA, GAMMA and SKYNET. They do all serve different purposes, thus it makes total sense to fit your problem to the specific grid.  SKYNET: Is a HPGPU (High performance graphiccard processing unit) which is very experimentally and needs a high degree of expertise. But you can also run regular jobs here, it is not forbidden. It has up to 80 Workers. It has eight M2050 Tesla GPU’s, which are pretty insane.  ALPHA: Is a HPC which makes use of the office computers when those are non busy, for example at night or on weekends. Since that cluster can shrink and grow depending on available resources there is no absolute number available, but the maximum is somewhere around 500 workers.  GAMMA: Is a HPC with 16 Workers but 32GB Memory, in case one must submitt a job with a huge requirement in terms of memory it is recomendet to use this grid.
  • 2.
    II. Connect tothe cluster Connecting to the cluster is easy as making coffee. Please download the profile from the project server https://projects.gwdg.de/projects/cluster and import them to your local Matlab application Fig.2. Afterwards it is recommended to run the test routines which are checking your configuration. It is very important that they are all marked as passed Fig.4. Where to find the button is shown in Fig.1. It might be that the system is asking for authentication ones, therefore please connect with your regular office computers credentials, the dialog window which appears looks like in Fig.3 Abbildung 1: Manage Cluster Abbildung 2: Import or find Clusters Abbildung 3: Connect to cluster with AD credentials Press here and click manage Clusters
  • 3.
    Abbildung 4: TestCluster Connectivity III. Monitor the jobs On the local computer one has a program called job manager, which is used to monitor the cluster resources. If for instance a job hangs up or one wants to chancel, this program is the necessary tool. In Fig.5 the typical layout of the job manager is displayed, to chancel you job, right click on it and chancel. To control different clusters, one needs to set the job manager to right cluster headnode, which is shown in Fig. 6. It is very important that you kill your jobs if they hang up, otherwise the other users of the cluster cannot use it at full resource level. Abbildung 5:Job Manager Abbildung 6: Select Head Node
  • 4.
    IV. Programming Tutorial The programs which are explained in this tutorial are available for direct use in Matlab Editor, please visit the project server: https://projects.gwdg.de/projects/cluster and download folder example files. Add them to your local Matlab path otherwise the interpreter cannot find them. First you will need to select the parallel configuration; this can be a cluster or your local machine if it consists of several CPU cores. Fig.7. Abbildung 7:Select Profile For more information on configurations and programming with user configurations, see: http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/f5-16141.html#f5-16540
  • 5.
    A. Using anInteractive MATLAB pool To interactively run your parallel code, you first need to open a MATLAB pool. This reserves a collection of MATLAB worker sessions to run your code. The MATLAB pool can consist of MATLAB sessions running on your local machine or on a remote cluster. In this case, we are initially running on your local machine. You can use matlabpool open to start an interactive worker pool. If the number of workers is not defined, the default number defined in your configuration will be used. A good rule of thumb is to not open more workers then cores available. If the Configuration argument is not provided, matlabpool will use the default configuration as setup in the beginning of this section. When you are finished running with your MATLAB pool, you can close it using matlabpool close. Two of the main parallel constructs that can be run on a MATLAB pool are parfor loops (parallel for-loops) and spmd blocks (single program - multiple data blocks). Both constructs allow for a straight- forward mixture of serial and parallel code. parfor loops are used for task-parallel (i.e. embarrassingly parallel) applications. parfor is used to speed up your code. Below is a simple for loop converted into a parfor to run in parallel, with different iterations of the loop running on different workers. The code outside the parfor loop executes as traditional MATLAB code (serially, in your client MATLAB session). Different workers. The code outside the parfor loop executes as traditional MATLAB code (serially, in your client MATLAB session). Note: The example below is located in the m-file, ‘parforExample1.m’. matlabpool open 2 % can adjust according to your resources N = 100; M = 200; a = zeros(N,1); tic; % serial (regular) for-loop for i = 1:N a(i) = a(i) + max(eig(rand(M))); end toc; tic; % parallel for-loop parfor i = 1:N a(i) = a(i) + max(eig(rand(M))); end toc; matlabpool close spmd blocks are a single program multiple data (SPMD) language construct. The "single program" aspect of spmd means that the identical code runs on multiple labs. The code within the spmd body executes simultaneously on the MATLAB workers. The "multiple data" aspect means that even though the spmd statement runs identical code on all workers, each worker can have different, unique data for that code. spmd blocks are useful when dealing with large data that cannot fit on a single machine. Unlike parfor, spmd blocks support inter-worker communication. They allow:  Arrays (and operations on them) to be distributed across multiple workers  Messages to be explicitly passed amongst workers.
  • 6.
    The example belowcreates a distributed array (different parts of the array are located on different workers) and computes the svd of this distributed array. The spmd block returns the data in the form of a composite object (behaves similarly to cells in serial MATLAB. For specifics, see the documentation link below). Note: The example below is located in the m-file, ‘spmdExample1.m’. matlabpool open 2 % can adjust according to your resources M = 200; spmd N = rand(M,M,codistributor); % 200x100 chunk per worker A = svd(N); end A = max(A{1}); % Indexing into the composite object disp(A) clear N matlabpool close For information on matlabpool, see: http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/ matlabpool.html For information about getting started using parfor loops, see: http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/ brb2x2l-1.html For information about getting started using spmd blocks, see: http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/ brukbno-2.html For information regarding composite objects: http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/ brukctb-1.html For information regarding distributed arrays: http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/bqi9fln-1.html
  • 7.
    1. Using Batchto Submit Serial Code – Best Practice for Scripts batch sends your serial script to run on one worker in your cluster. All of the variables in your client workspace (e.g. the MATLAB process you are submitting from) are sent to the worker by default. You can alternatively pass a subset of these variables by defining the Workspace argument and passing the desired variables in a structure. After your job has finished, you can use the load command to retrieve the results from the worker-workspace back into your client-workspace. In this and all examples following, we use a wait to ensure the job is done before we load back in worker-workspace. This is optional, but you can not load the data from a task or job until that task or job is finished. So, we use wait to block the MATLAB command line until that occurs. If the Configuration argument is not provided, batch will use the default configuration that was set up above. Note: For this example to work, you will need ‘testBatch.m’ on the machine that you are submitting from (i.e. the client machine). This example below is located in the m-file, ‘submitJob2a.m’. %% This script submits a serial script using batch job2a = batch('testBatch'); wait(job2a); % only can load when job is finished sprintf('Finished Running Job') load(job2a); % loads all variables back sprintf('Loaded Variables into Workspace') % load(job2a, 'A'); % only loads variable A destroy(job2a) % permanently removes job data sprintf('Test Completed') If you have submitted successfully, you should see the following variables appear in your client workspace: Abbildung 9: Workspace For more information on batch, see: http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distc omp/batch.html and here: Abbildung 8: Batch Job
  • 8.
    http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/brjw1e5-1.html#brjw1fx-3 2. UsingBatch to Submit Scripts that Run Using a MATLAB pool batch with the 'matlabpool' option sends scripts containing parfor or spmd to run on workers via a MATLAB pool. In this process, one worker behaves like a MATLAB client process that facilitates the distribution of the job amongst the workers in the pool and runs the serial portion of the script. Therefore, specifying a 'matlabpool' of size N actually will result in N+1 workers being used. Just like in step 2a, all variables are automatically sent from your client workspace (i.e. the workspace of the MATLAB you are submitting from) to the worker’s workspace on the cluster. load then brings the results from your worker’s workspace back into your client’s workspace. If a configuration is not specified, batch uses the default configuration as defined in the beginning of this section. Note: For this example to work, you will need ‘testParforBatch.m’ on the machine that you are submitting from (i.e. the client machine). This example below is located in the m-file, submitJob2b.m. %% This script submits a parfor script using batch job2b = batch('testParforBatch','matlabpool',2); wait(job2b); % only can load when job is finished sprintf('Finished Running Job') load(job2b); % loads all variables back sprintf('Loaded Variables into Workspace') % load(job2b, 'A'); % only loads variable A destroy(job2b) % permanently removes job data sprintf('Test Completed') If you have submitted successfully, you should see the following variables appear in your client workspace: Abbildung 10: Workspace Batch Pool
  • 9.
    The above codesubmitted a script containing a parfor. You can submit a script containing a spmd block in the same fashion by changing the name of the submission script in the batch command. Note: For this example to work, you will need ‘testSpmdBatch.m’ on the machine that you are submitting from (i.e. the client machine). This example below is located in the m-file, submitJob2b_spmd.m. %% This script submits a spmd script using batch job2b = batch('testSpmdBatch','matlabpool',2); wait(job2b); % only can load when job is finished sprintf('Finished Running Job') load(job2b); % loads all variables back sprintf('Loaded Variables into Workspace') % load(job2b, 'A'); % only loads variable A destroy(job2b) % permanently removes job data sprintf('Test Completed') If you have submitted successfully, you should see the following variables appear in your client workspace: Abbildung 11: Batch Pool SPMD
  • 10.
    B. Run Task-ParallelExample with Jobs and Tasks In this example, we are sending a task parallel job with multiple tasks. Each task evaluates the built-in MATLAB function. The createTask function in the below example is passed the job, the function to be run in the form of a function handle (@sum), the number of output arguments of the function (1), and the input argument to the sum function in the form of a cell array ({[1 1]}); If not given a configuration, findResource uses the scheduler found in the default configuration defined in the beginning of this section. Note: This example is located in the m-file, ‘submitJob3a.m’. %% This script submits a job with 3 tasks sched = findResource(); job3a = createJob(sched); createTask(job3a, @sum, 1, {[1 1]}); createTask(job3a, @sum, 1, {[2 2]}); createTask(job3a, @sum, 1, {[3 3]}); submit(job3a) waitForState(job3a, 'finished') %optional sprintf('Finished Running Job') results = getAllOutputArguments(job3a); sprintf('Got Output Arguments') destroy(job3a) % permanently removes job data sprintf('Test Completed') If you have submitted successfully, you should see the following variables appear in your client workspace: Abbildung 12: Parallel Task results should contain the following: Abbildung 13: Terminal Output Task Parallel
  • 11.
    You can alsocall a user-created function in the same way as shown above. In that case, you will need to make sure that any scripts, files, or functions that the task function uses are accessible to the cluster. You can do this by sending those files to the cluster via the FileDependencies property or by directing the worker to a shared directory containing those files via the PathDependencies property. An example of using FileDependencies is shown below: Note: you will need to have a ‘testTask.m’ file on the machine you are submitting from for this example to work. This example is located in the m- file, ‘submitJob3b.m’. % This script submits a job with 3 tasks sched = findResource(); job3b = createJob(sched,'FileDependencies',{'testTask.m'}); createTask(job3b, @testTask, 1, {1,1}); createTask(job3b, @testTask, 1, {2,2}); createTask(job3b, @testTask, 1, {3,3}); submit(job3b) waitForState(job3b, 'finished') % optional sprintf('Finished Running Job') results = getAllOutputArguments(job3b); sprintf('Got Output Arguments') destroy(job3b) % permanently removes job data sprintf('Test Completed') If you have submitted successfully, you should see the following variables appear in your client workspace: Abbildung 14: Task Parallel Workspace
  • 12.
    results should containthe following: Abbildung 15: Task Parallel Output For more information on File and Path Dependencies, see the below documentation. File Dependencies: http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/filedependencies.html Path Dependencies: http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/ pathdependencies.html More general overview about sharing code between client and workers: http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/bqur7ev-2.html#bqur7ev-9
  • 13.
    C. Run Task-ParallelExample with a MATLAB pool job - Best Practice for parfor or spmd in functions In this example, we are sending a MATLAB pool job with a single task. This is nearly equivalent to sending a batch job (see step 2b) with a parfor or a spmd block, except this method is best used when sending functions and not scripts. It behaves just like jobs/tasks explained in step 3. The function referenced in the task contains a parfor. Note: For this example to work, you will need ‘testParforJob.m’ on the machine that you are submitting from (i.e. the client machine). This example is located in the m-file, ‘submitJob4.m’. % This script submits a function that contains parfor sched = findResource(); job4 = createMatlabPoolJob(sched,'FileDependencies',... {'testParforJob.m'}); createTask(job4, @testParforJob, 1, {}); set(job4, 'MaximumNumberOfWorkers', 3); set(job4, 'MinimumNumberOfWorkers', 3); submit(job4) waitForState(job4, 'finished') % optional sprintf('Finished Running Job') results = getAllOutputArguments(job4); sprintf('Got Output Arguments') destroy(job4) % permanently removes job data sprintf('Test Completed') If you have submitted successfully, you should see the following variables appear in your client workspace: results{1} should contain a [50x1 double]. For more information on creating and submitting MATLAB pool jobs, see http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distc omp/creatematlabpooljob.html Abbildung 16: WOrkspace Variables SPMD in Functions
  • 14.
    D. Run Data-ParallelExample In this step, we are sending a data parallel job with a single task. The format is similar to that of jobs/tasks (see step 3). For parallel jobs, you only have one task. That task refers to a function that uses distributed arrays, labindex, or some mpi functionality. In this case, we are running a simple built in function (labindex) which takes no inputs and returns a single output. labindex returns the ID value for each of worker processes that ran the it . The value of labindex spans from 1 to n, where n is the number of labs running the current job Note: This example is located in the m-file, ‘submitJob5.m’. %% Script submits a data parallel job, with one task sched = findResource(); job5 = createParallelJob(sched); createTask(job5, @labindex, 1, {}); set(job5, 'MaximumNumberOfWorkers', 3); set(job5, 'MinimumNumberOfWorkers', 3); submit(job5) waitForState(job5, 'finished') % optional sprintf('Finished Running Job') results = getAllOutputArguments(job5); sprintf('Got Output Arguments') destroy(job5); % permanently removes job data sprintf('Test Completed') If you have submitted successfully, you should see the following variables appear in your client workspace: Abbildung 17: Workspace Data Parallel
  • 15.
    results should containthe following: Abbildung 18: Results Data Parallel For more information on creating and submitting data parallel jobs, see: http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/createparalleljob.html For more information on, labindex, see: http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/index.html?/access/helpdesk/help/toolbox/distcomp/labindex.html E. Node GPU Processing If one needs to accelerate the execution even further a good strategy is to use the GPU CUDA building functions of Matlab distributed toolbox. Set the cluster profile to SKYNET and open a pool of workers. Matlabpool open … The Scheduler of located on the headnode of the cluster will recognize that your job has GPU code on board and will switch the scheduling profile automatically to dispatch only to the nodes which have a GPU integrated. In any case it is a good idea to catch the error if the scheduler will not work properly. See the next block of source code to see an excellent example how to do that.
  • 16.
    function testGPUInParfor() spmd selectGPUDeviceForLab(); end parfor i = 1:1000 % Each iteration will generate some data A A = rand(5555); if selectGPUDeviceForLab() A = gpuArray(A); disp( 'Do it on the GPU' ) else disp( 'Do it on the host' ) end % replace the following line with whatever task you need to do S = sum(A,1); % Maybe collect back from GPU (gather is a no-op if not on the GPU) S = gather(S); end function ok = selectGPUDeviceForLab() persistent hasGPU; if isempty( hasGPU ) devIdx = mod(labindex-1,gpuDeviceCount())+1; try dev = gpuDevice( devIdx ); hasGPU = dev.DeviceSupported; catch %#ok hasGPU = false; end end ok = hasGPU; F. Avoid Errors – Use the Compiler To Your Advantage It is counterintuitive that in a parallel loop or parallel data block, one single iteration of that particular block is running on a parallel task, aka a worker. Therefor the iterations must be iterations save, since there is no guaranty that the iterations are running in an ascending order. What I prefer to do is borrow the map reduce approach, of course we don’t reduce anything here but the design pattern is great to prevent you some headache. In the reduce step, a piece of data and function is scheduled to a worker, there the reduce function does produce a return. (In real map reduce one can now proced and use the output of step(x-1) in step(x), which we normally don’t.) I can only advice, make extensively use of function returns. The idea is that if a chunk of data is distributed via a dimension or a chunk of iterations are distributed over the sum of iterations that every slice goes via a function call to the worker and comes back via return to the parallel block. Besides it has advantages for the underlying MPI which would shovel all variables via multicast to all workers instead only to one worker, but this is above the scope of this document.
  • 17.
    In addition itis very clever to write your function in a way that it can run on both, local and cluster environment. For reference see the example folder “sofi” on the project server. Problem: Parfor u=u*v; end Better: Parfor u=multi(u,v); end G. Summary Chart for Scheduling Options Abbildung 19: Scheduler Options
  • 18.
    V. Glossary: HPGPU:High performance graphiccard processing unit Worker: A worker is a parallel task. In other words, if one as 100 workers available, one can run 100 iterations of a parallel loop in one time intervall (tick) of the processor. Node: A node is a physical machine, for example a computer connected to a cluster is a node of that very cluster. This node can have 16 Workers, if the node has four CPU’s with for cores on each CPU. MPI: Message Passing Interface is a fancy piece of software which distributes processes around in a grid via RPC’s. RPC: Remote Proceture Calls