Job lifecycle

/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira
20100127 @ uu.nl
Life-cycle of a Grid Computing Job
with some side stories

1/24
Outline
 Grid & Science - EGEE
 Virtual Organizations
 enmr.eu architecture
 Grid Job Life Cycle
 Hello Grid!
 CNS tutorial
 Web Portals

2/24
The Grid
“Coordinated resource sharing and problem solving in dynamic,
multi-institutional virtual organizations”.
Foster, I. et al., Int. J. Superc. Appli. (2000)15:3

3/24
Why do scientists need the Grid?
High-energy physics (15 PB/year)
15 PB ~ 20*10^6 CD’s
Genome projects, data mining,
Tackling the protein folding,
Protein structure, …

4/24
Enabling Grids for E-science
GStat (Jan 2010) : http://goc.grid.sinica.edu.tw/gstat/
Infrastructure
 317 sites
 58 countries
 ~ 140K CPU’s 24/7
 ~ 69 PB disk
Users
 182 registered VO’s
 ~ 12K registered users
 > 300K jobs / day

5/24
Registered EGEE Virtual Organizations
Application domain Active VO’s Users
High-energy Physics 41 4737
Infrastructures 28 2365
Life Sciences 10 519
... ... ...
Total 182 11908
http://cic.gridops.org/index.php?section=home&page=volist
VO name Scope Registered Users
(20090210)
Registered Users
(20100125)
biomed Gobal 223 257
enmr.eu Global 54 155

VO Registered Users
6/24Stats : 20100125

7/24
How to become an enmr.eu user?
http://ca.dutchgrid.nl/request/
/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Your Name

enmr.eu Grid architecture
8/24

The (not so short) Job Life-cycle
10/24www.gridcafe.org

Authentication and Authorization (1/2)
11/24
[nuno@ui-enmr ~]$ ll ~/.globus
total 16
-rw-r--r-- 1 nuno users 2189 Nov 14 17:18 usercert.p12
-rw-r--r-- 1 nuno users 4947 Nov 14 17:19 usercert.pem
-rw------- 1 nuno users 963 Nov 14 17:20 userkey.pem
[nuno@ui-enmr ~]$ voms-proxy-init --voms enmr.eu
Cannot find file or dir: /home/nuno/.glite/vomses
Enter GRID pass phrase:
Your identity: /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira
Creating temporary proxy ........................... Done
Contacting voms-02.pd.infn.it:15014 [/C=IT/O=INFN/OU=Host/L=Padova/CN=voms-02.pd.infn.it]
"enmr.eu" Done
Creating proxy .......................... Done
Your proxy is valid until Wed Jan 27 03:44:48 2010
[nuno@ui-enmr ~]$ grid-cert-info -s -i -sd -ed
/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira
/C=NL/O=NIKHEF/CN=NIKHEF medium-security certification auth
Oct 23 00:00:00 2009 GMT
Oct 23 15:15:43 2010 GMT

Authentication and Authorization (2/2)
12/24
[nuno@ui-enmr ~]$ voms-proxy-init --voms enmr.eu
Cannot find file or dir: /home/nuno/.glite/vomses
Enter GRID pass phrase:
Your identity: /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira
Creating temporary proxy ............................................... Done
Contacting voms2.cnaf.infn.it:15014 [/C=IT/O=INFN/OU=Host/L=CNAF/CN=voms2.cnaf.infn.it] "enmr.
Creating proxy ............................................. Done
Your proxy is valid until Wed Jan 27 03:54:00 2010
[nuno@ui-enmr ~]$ voms-proxy-info
subject : /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira/CN=pr
issuer : /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira
identity : /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira
type : proxy
strength : 1024 bits
path : /tmp/x509up_u500
timeleft : 11:56:19

Available resources
14/24
[nuno@ui-enmr bcbr]$ lcg-infosites --vo enmr.eu ce all
#CPU Free Total Jobs Running Waiting ComputingElement
----------------------------------------------------------
399 20 85 57 28 grid-ce-01.ba.infn.it:2119/jobmanager-lcgpbs-short
16 7 9 9 0 ce-enmr.chem.uu.nl:2119/jobmanager-lcgpbs-medium
88 88 0 0 0 glite-ce.grid.uj.ac.za:8443/cream-pbs-long
2460 906 103 103 0 trekker.nikhef.nl:2119/jobmanager-pbs-medium
1632 1584 45 45 0 deimos.htc.biggrid.nl:2119/jobmanager-pbs-medium
200 0 0 0 0 t2-ce-05.lnl.infn.it:8443/cream-lsf-enmr1
… snip …
Avail Space(Kb) Used Space(Kb) Type SEs
----------------------------------------------------------
2444576886 555136905 n.a prod-se-01.pd.infn.it
3127661680 1371977164 n.a prod-se-02.pd.infn.it
1858674692 106001211 n.a se-enmr.chem.uu.nl
13828076063 21152016643 n.a se01.dur.scotgrid.ac.uk
… snip …

Submit a job
15/24
[nuno@ui-enmr bcbr]$ glite-wms-job-submit -a -o jid hello.jdl
Connecting to the service https://wms-enmr.chem.uu.nl:7443/glite_wms_wmproxy_server
====================== glite-wms-job-submit Success ======================
The job has been successfully submitted to the WMProxy
Your job identifier is:
https://lb-enmr.chem.uu.nl:9000/gOtqQuG4ebqpz3m5z8_2Eg
The job identifier has been saved in the following file:
/home/nuno/grid/hello/bcbr/jid
==========================================================================

Query Job Status
16/24
[nuno@ui-enmr bcbr]$ glite-wms-job-status -i jid
*************************************************************
BOOKKEEPING INFORMATION:
Status info for the Job : https://lb-enmr.chem.uu.nl:9000/gOtqQuG4ebqpz3m5z8_2Eg
Current Status: Scheduled
Status Reason: Job successfully submitted to Globus
Destination: pbs-enmr.cerm.unifi.it:2119/jobmanager-lcgpbs-verylong
Submitted: Tue Jan 26 16:26:07 2010 CET
*************************************************************
[nuno@ui-enmr bcbr]$ glite-wms-job-status -i jid
*************************************************************
BOOKKEEPING INFORMATION:
Status info for the Job : https://lb-enmr.chem.uu.nl:9000/gOtqQuG4ebqpz3m5z8_2Eg
Current Status: Done (Success)
Exit code: 0
Status Reason: Job terminated successfully
Destination: pbs-enmr.cerm.unifi.it:2119/jobmanager-lcgpbs-verylong
Submitted: Tue Jan 26 16:26:07 2010 CET
*************************************************************

Retrieve Job Output
17/24
[nuno@ui-enmr bcbr]$ glite-wms-job-output -i jid --dir ./out
Connecting to the service https://wms-enmr.chem.uu.nl:7443/glite_wms_wmproxy_server
================================================================================
JOB GET OUTPUT OUTCOME
Output sandbox files for the job:
https://lb-enmr.chem.uu.nl:9000/gOtqQuG4ebqpz3m5z8_2Eg
have been successfully retrieved and stored in the directory:
/home/nuno/grid/hello/bcbr/out
================================================================================
[nuno@ui-enmr bcbr]$ ll ./out/
total 4
-rw-r--r-- 1 nuno users 0 Jan 26 17:31 hello.err
-rw-r--r-- 1 nuno users 48 Jan 26 17:31 hello.out
[nuno@ui-enmr bcbr]$ more ./out/hello.out
Hello Grid! I was here : wn3-enmr.cerm.unifi.it

CNS example (1/3)
18/24
[nuno@ui-enmr cns-example]$ ll
total 160
-rw-r--r-- 1 nuno users 144884 Mar 18 2009 cns-input.tgz
-rw-r--r-- 1 nuno users 1529 Mar 18 2009 README
-rwxr-xr-x 1 nuno users 134 Mar 18 2009 run-cns
-rw-r--r-- 1 nuno users 229 Jan 17 17:58 run-cns.jdl
[nuno@ui-enmr cns-example]$ tar tvzf cns-input.tgz
-rw-r--r-- abonvin/staff 30070 2008-05-06 12:42:33 CaMM13Tmpcs1.tbl
-rw-r--r-- abonvin/staff 16946 2008-05-06 12:42:33 CaMM13Tmrdc1.tbl
-rw-r--r-- abonvin/staff 912 2008-05-06 12:44:53 README
-rw-r--r-- abonvin/staff 208142 2008-05-06 12:42:33 calmodulin-MM13.pdb
-rw-r--r-- abonvin/staff 341327 2008-05-06 12:42:33 calmodulin-MM13.psf
-rw-r--r-- abonvin/staff 4982 2008-05-06 12:42:33 ion.param
-rw-r--r-- abonvin/staff 158398 2008-05-06 12:42:33 noes.tbl
-rw-r--r-- abonvin/staff 548 2008-05-06 12:42:33 par_axis.pro
-rw-r--r-- abonvin/staff 74090 2008-05-06 12:42:33 parallhdg5.3.pro
-rw-r--r-- abonvin/staff 16549 2008-05-06 12:42:33 phipsi.tbl
-rw-r--r-- abonvin/staff 9571 2008-05-06 12:42:33 sa-test.inp
-rw-r--r-- abonvin/staff 273 2008-05-06 12:42:33 tensor.pdb
-rw-r--r-- abonvin/staff 1181 2008-05-06 12:42:33 tensor.psf
-rw-r--r-- abonvin/staff 57 2008-05-06 12:42:33 tensor.tbl
http://www.enmr.eu/eNMR-tutorials

CNS example (2/3)
19/24
[nuno@ui-enmr cns-example]$ more run-cns
source $VO_ENMR_EU_SW_DIR/BCBR/cns/1.2-para/set_cns.bash
tar xfz cns-input.tgz
cns < sa-test.inp > sa-test.out
tar cvfz cns-output.tgz *
[nuno@ui-enmr cns-example]$ more run-cns.jdl
Executable = "run-cns";
StdOutput = "std.out";
StdError = "std.err";
InputSandbox = {"cns-input.tgz","run-cns"};
OutputSandbox = {"std.out", "std.err","cns-output.tgz"};
Requirements = RegExp ("chem.uu.nl",other.GlueCEUniqueId);

CNS example (3/3)
20/24
[nuno@ui-enmr cns-example]$ glite-wms-job-submit -a -o jid run-cns.jdl
[nuno@ui-enmr cns-example]$ glite-wms-job-output -i jid –dir ./
[nuno@ui-enmr cns-example]$ ll
total 24464
-rw-r--r-- 1 nuno users 144884 Mar 18 2009 cns-input.tgz
-rw-r--r-- 1 nuno users 24854174 Jan 26 18:24 cns-output.tgz
-rw-r--r-- 1 nuno users 79 Jan 26 17:13 jid
-rw-r--r-- 1 nuno users 1529 Mar 18 2009 README
-rwxr-xr-x 1 nuno users 137 Jan 26 17:12 run-cns
-rw-r--r-- 1 nuno users 229 Jan 17 17:58 run-cns.jdl
[nuno@ui-enmr out]$ more sa_1.pdb
REMARK FILENAME="/home/enmr016/globus-tmp.wn23-enmr.25892.0/https_3a_2f_2flb-"
… snip …
REMARK DATE:26-Jan-2010 17:29:14 created by user: enmr016
REMARK VERSION:1.2
ATOM 1 HA ALA 1 1.868 27.047 -8.664 1.00 15.00 A
ATOM 2 CB ALA 1 0.511 28.488 -7.902 1.00 15.00 A
ATOM 3 HB1 ALA 1 0.379 28.981 -8.854 1.00 15.00 A
… snip …

21/24
Web Portal Grid Interaction

On-going work – GROMACS WebPortal
22/24

Zwartkijken / Idées Noires - Franquin
“Life cycle of a GRID computing job?
That's something like:
conception..,
abortion..,
conception..,
birth..,
premature death..,
reanimation.., etc?
:p
T.”
20100127 – 11AM
23/24

24/24
Big-Picture layer
/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Alexandre Bonvin
Rolf Boleans
Hardware-layer
/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Johan van der Zwan
Middleware layer
/C=IT/O=INFN/OU=Personal Certificate/L=Padova/ CN=Cristina Aiftimiei
Application layer
/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/ CN=Marc van Dijk
/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/ CN=Sjoerd De Vries
/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/ CN=Tsjerk Wassenaar
User layer
/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/ CN=*.*
Acknowlegments

Service Availability Monitoring
25/24

Grid Operations Center Data Base
26/24

Building a Grid
27/2427/24
1. The architecture
2. The hardware
3. The middleware
Network
Resources
Middleware
Application
User-centric

Job lifecycle

Recommended

Recommended

More Related Content

Similar to Job lifecycle

Similar to Job lifecycle (20)

More from Nuno Ferreira

More from Nuno Ferreira (12)

Job lifecycle