SlideShare a Scribd company logo
/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira
20100127 @ uu.nl
Life-cycle of a Grid Computing Job
with some side stories
1/24
Outline
 Grid & Science - EGEE
 Virtual Organizations
 enmr.eu architecture
 Grid Job Life Cycle
 Hello Grid!
 CNS tutorial
 Web Portals
2/24
The Grid
“Coordinated resource sharing and problem solving in dynamic,
multi-institutional virtual organizations”.
Foster, I. et al., Int. J. Superc. Appli. (2000)15:3
3/24
Why do scientists need the Grid?
High-energy physics (15 PB/year)
15 PB ~ 20*10^6 CD’s
Genome projects, data mining,
Tackling the protein folding,
Protein structure, …
4/24
Enabling Grids for E-science
GStat (Jan 2010) : http://goc.grid.sinica.edu.tw/gstat/
Infrastructure
 317 sites
 58 countries
 ~ 140K CPU’s 24/7
 ~ 69 PB disk
Users
 182 registered VO’s
 ~ 12K registered users
 > 300K jobs / day
5/24
Registered EGEE Virtual Organizations
Application domain Active VO’s Users
High-energy Physics 41 4737
Infrastructures 28 2365
Life Sciences 10 519
... ... ...
Total 182 11908
http://cic.gridops.org/index.php?section=home&page=volist
VO name Scope Registered Users
(20090210)
Registered Users
(20100125)
biomed Gobal 223 257
enmr.eu Global 54 155
VO Registered Users
6/24Stats : 20100125
7/24
How to become an enmr.eu user?
http://ca.dutchgrid.nl/request/
/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Your Name
enmr.eu Grid architecture
8/24
Enmr.eu Grid Status
9/24
The (not so short) Job Life-cycle
10/24www.gridcafe.org
Authentication and Authorization (1/2)
11/24
[nuno@ui-enmr ~]$ ll ~/.globus
total 16
-rw-r--r-- 1 nuno users 2189 Nov 14 17:18 usercert.p12
-rw-r--r-- 1 nuno users 4947 Nov 14 17:19 usercert.pem
-rw------- 1 nuno users 963 Nov 14 17:20 userkey.pem
[nuno@ui-enmr ~]$ voms-proxy-init --voms enmr.eu
Cannot find file or dir: /home/nuno/.glite/vomses
Enter GRID pass phrase:
Your identity: /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira
Creating temporary proxy ........................... Done
Contacting voms-02.pd.infn.it:15014 [/C=IT/O=INFN/OU=Host/L=Padova/CN=voms-02.pd.infn.it]
"enmr.eu" Done
Creating proxy .......................... Done
Your proxy is valid until Wed Jan 27 03:44:48 2010
[nuno@ui-enmr ~]$ grid-cert-info -s -i -sd -ed
/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira
/C=NL/O=NIKHEF/CN=NIKHEF medium-security certification auth
Oct 23 00:00:00 2009 GMT
Oct 23 15:15:43 2010 GMT
Authentication and Authorization (2/2)
12/24
[nuno@ui-enmr ~]$ voms-proxy-init --voms enmr.eu
Cannot find file or dir: /home/nuno/.glite/vomses
Enter GRID pass phrase:
Your identity: /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira
Creating temporary proxy ............................................... Done
Contacting voms2.cnaf.infn.it:15014 [/C=IT/O=INFN/OU=Host/L=CNAF/CN=voms2.cnaf.infn.it] "enmr.
Creating proxy ............................................. Done
Your proxy is valid until Wed Jan 27 03:54:00 2010
[nuno@ui-enmr ~]$ voms-proxy-info
subject : /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira/CN=pr
issuer : /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira
identity : /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira
type : proxy
strength : 1024 bits
path : /tmp/x509up_u500
timeleft : 11:56:19
Available resources
14/24
[nuno@ui-enmr bcbr]$ lcg-infosites --vo enmr.eu ce all
#CPU Free Total Jobs Running Waiting ComputingElement
----------------------------------------------------------
399 20 85 57 28 grid-ce-01.ba.infn.it:2119/jobmanager-lcgpbs-short
16 7 9 9 0 ce-enmr.chem.uu.nl:2119/jobmanager-lcgpbs-medium
88 88 0 0 0 glite-ce.grid.uj.ac.za:8443/cream-pbs-long
2460 906 103 103 0 trekker.nikhef.nl:2119/jobmanager-pbs-medium
1632 1584 45 45 0 deimos.htc.biggrid.nl:2119/jobmanager-pbs-medium
200 0 0 0 0 t2-ce-05.lnl.infn.it:8443/cream-lsf-enmr1
… snip …
Avail Space(Kb) Used Space(Kb) Type SEs
----------------------------------------------------------
2444576886 555136905 n.a prod-se-01.pd.infn.it
3127661680 1371977164 n.a prod-se-02.pd.infn.it
1858674692 106001211 n.a se-enmr.chem.uu.nl
13828076063 21152016643 n.a se01.dur.scotgrid.ac.uk
… snip …
Submit a job
15/24
[nuno@ui-enmr bcbr]$ glite-wms-job-submit -a -o jid hello.jdl
Connecting to the service https://wms-enmr.chem.uu.nl:7443/glite_wms_wmproxy_server
====================== glite-wms-job-submit Success ======================
The job has been successfully submitted to the WMProxy
Your job identifier is:
https://lb-enmr.chem.uu.nl:9000/gOtqQuG4ebqpz3m5z8_2Eg
The job identifier has been saved in the following file:
/home/nuno/grid/hello/bcbr/jid
==========================================================================
Query Job Status
16/24
[nuno@ui-enmr bcbr]$ glite-wms-job-status -i jid
*************************************************************
BOOKKEEPING INFORMATION:
Status info for the Job : https://lb-enmr.chem.uu.nl:9000/gOtqQuG4ebqpz3m5z8_2Eg
Current Status: Scheduled
Status Reason: Job successfully submitted to Globus
Destination: pbs-enmr.cerm.unifi.it:2119/jobmanager-lcgpbs-verylong
Submitted: Tue Jan 26 16:26:07 2010 CET
*************************************************************
[nuno@ui-enmr bcbr]$ glite-wms-job-status -i jid
*************************************************************
BOOKKEEPING INFORMATION:
Status info for the Job : https://lb-enmr.chem.uu.nl:9000/gOtqQuG4ebqpz3m5z8_2Eg
Current Status: Done (Success)
Exit code: 0
Status Reason: Job terminated successfully
Destination: pbs-enmr.cerm.unifi.it:2119/jobmanager-lcgpbs-verylong
Submitted: Tue Jan 26 16:26:07 2010 CET
*************************************************************
Retrieve Job Output
17/24
[nuno@ui-enmr bcbr]$ glite-wms-job-output -i jid --dir ./out
Connecting to the service https://wms-enmr.chem.uu.nl:7443/glite_wms_wmproxy_server
================================================================================
JOB GET OUTPUT OUTCOME
Output sandbox files for the job:
https://lb-enmr.chem.uu.nl:9000/gOtqQuG4ebqpz3m5z8_2Eg
have been successfully retrieved and stored in the directory:
/home/nuno/grid/hello/bcbr/out
================================================================================
[nuno@ui-enmr bcbr]$ ll ./out/
total 4
-rw-r--r-- 1 nuno users 0 Jan 26 17:31 hello.err
-rw-r--r-- 1 nuno users 48 Jan 26 17:31 hello.out
[nuno@ui-enmr bcbr]$ more ./out/hello.out
Hello Grid! I was here : wn3-enmr.cerm.unifi.it
CNS example (1/3)
18/24
[nuno@ui-enmr cns-example]$ ll
total 160
-rw-r--r-- 1 nuno users 144884 Mar 18 2009 cns-input.tgz
-rw-r--r-- 1 nuno users 1529 Mar 18 2009 README
-rwxr-xr-x 1 nuno users 134 Mar 18 2009 run-cns
-rw-r--r-- 1 nuno users 229 Jan 17 17:58 run-cns.jdl
[nuno@ui-enmr cns-example]$ tar tvzf cns-input.tgz
-rw-r--r-- abonvin/staff 30070 2008-05-06 12:42:33 CaMM13Tmpcs1.tbl
-rw-r--r-- abonvin/staff 16946 2008-05-06 12:42:33 CaMM13Tmrdc1.tbl
-rw-r--r-- abonvin/staff 912 2008-05-06 12:44:53 README
-rw-r--r-- abonvin/staff 208142 2008-05-06 12:42:33 calmodulin-MM13.pdb
-rw-r--r-- abonvin/staff 341327 2008-05-06 12:42:33 calmodulin-MM13.psf
-rw-r--r-- abonvin/staff 4982 2008-05-06 12:42:33 ion.param
-rw-r--r-- abonvin/staff 158398 2008-05-06 12:42:33 noes.tbl
-rw-r--r-- abonvin/staff 548 2008-05-06 12:42:33 par_axis.pro
-rw-r--r-- abonvin/staff 74090 2008-05-06 12:42:33 parallhdg5.3.pro
-rw-r--r-- abonvin/staff 16549 2008-05-06 12:42:33 phipsi.tbl
-rw-r--r-- abonvin/staff 9571 2008-05-06 12:42:33 sa-test.inp
-rw-r--r-- abonvin/staff 273 2008-05-06 12:42:33 tensor.pdb
-rw-r--r-- abonvin/staff 1181 2008-05-06 12:42:33 tensor.psf
-rw-r--r-- abonvin/staff 57 2008-05-06 12:42:33 tensor.tbl
http://www.enmr.eu/eNMR-tutorials
CNS example (2/3)
19/24
[nuno@ui-enmr cns-example]$ more run-cns
source $VO_ENMR_EU_SW_DIR/BCBR/cns/1.2-para/set_cns.bash
tar xfz cns-input.tgz
cns < sa-test.inp > sa-test.out
tar cvfz cns-output.tgz *
[nuno@ui-enmr cns-example]$ more run-cns.jdl
Executable = "run-cns";
StdOutput = "std.out";
StdError = "std.err";
InputSandbox = {"cns-input.tgz","run-cns"};
OutputSandbox = {"std.out", "std.err","cns-output.tgz"};
Requirements = RegExp ("chem.uu.nl",other.GlueCEUniqueId);
CNS example (3/3)
20/24
[nuno@ui-enmr cns-example]$ glite-wms-job-submit -a -o jid run-cns.jdl
[nuno@ui-enmr cns-example]$ glite-wms-job-output -i jid –dir ./
[nuno@ui-enmr cns-example]$ ll
total 24464
-rw-r--r-- 1 nuno users 144884 Mar 18 2009 cns-input.tgz
-rw-r--r-- 1 nuno users 24854174 Jan 26 18:24 cns-output.tgz
-rw-r--r-- 1 nuno users 79 Jan 26 17:13 jid
-rw-r--r-- 1 nuno users 1529 Mar 18 2009 README
-rwxr-xr-x 1 nuno users 137 Jan 26 17:12 run-cns
-rw-r--r-- 1 nuno users 229 Jan 17 17:58 run-cns.jdl
[nuno@ui-enmr out]$ more sa_1.pdb
REMARK FILENAME="/home/enmr016/globus-tmp.wn23-enmr.25892.0/https_3a_2f_2flb-"
… snip …
REMARK DATE:26-Jan-2010 17:29:14 created by user: enmr016
REMARK VERSION:1.2
ATOM 1 HA ALA 1 1.868 27.047 -8.664 1.00 15.00 A
ATOM 2 CB ALA 1 0.511 28.488 -7.902 1.00 15.00 A
ATOM 3 HB1 ALA 1 0.379 28.981 -8.854 1.00 15.00 A
… snip …
21/24
Web Portal Grid Interaction
On-going work – GROMACS WebPortal
22/24
Zwartkijken / Idées Noires - Franquin
“Life cycle of a GRID computing job?
That's something like:
conception..,
abortion..,
conception..,
birth..,
premature death..,
reanimation.., etc?
:p
T.”
20100127 – 11AM
23/24
24/24
Big-Picture layer
/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Alexandre Bonvin
Rolf Boleans
Hardware-layer
/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Johan van der Zwan
Middleware layer
/C=IT/O=INFN/OU=Personal Certificate/L=Padova/ CN=Cristina Aiftimiei
Application layer
/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/ CN=Marc van Dijk
/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/ CN=Sjoerd De Vries
/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/ CN=Tsjerk Wassenaar
User layer
/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/ CN=*.*
Acknowlegments
Service Availability Monitoring
25/24
Grid Operations Center Data Base
26/24
Building a Grid
27/2427/24
1. The architecture
2. The hardware
3. The middleware
Network
Resources
Middleware
Application
User-centric

More Related Content

Similar to Job lifecycle

Reproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and NextflowReproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and Nextflow
inside-BigData.com
 
Designof traffic isolationby using flow based tunneling
Designof traffic isolationby using flow based tunnelingDesignof traffic isolationby using flow based tunneling
Designof traffic isolationby using flow based tunneling
soichi shigeta
 
YANG model for NETCONF Event Notifications
YANG model for NETCONF Event NotificationsYANG model for NETCONF Event Notifications
YANG model for NETCONF Event Notifications
ThomasGraf42
 
Sunil
SunilSunil
Basic onos-tutorial
Basic onos-tutorialBasic onos-tutorial
Basic onos-tutorial
Eueung Mulyana
 
Method and apparatus for intelligent management of a network element
Method and apparatus for intelligent management of a network elementMethod and apparatus for intelligent management of a network element
Method and apparatus for intelligent management of a network element
Tal Lavian Ph.D.
 
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Anne Nicolas
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
Brendan Gregg
 
Chasing the optimizer
Chasing the optimizerChasing the optimizer
Chasing the optimizer
Mauro Pagano
 
How to use choice component
How to use choice componentHow to use choice component
How to use choice component
maheshtheapex
 
Mule-choice component
Mule-choice componentMule-choice component
Mule-choice component
DivyaSree1391
 
Reconnaissance of Virtio: What’s new and how it’s all connected?
Reconnaissance of Virtio: What’s new and how it’s all connected?Reconnaissance of Virtio: What’s new and how it’s all connected?
Reconnaissance of Virtio: What’s new and how it’s all connected?
Samsung Open Source Group
 
2015.07.16 Способы диагностики PostgreSQL
2015.07.16 Способы диагностики PostgreSQL2015.07.16 Способы диагностики PostgreSQL
2015.07.16 Способы диагностики PostgreSQL
dev1ant
 
How to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor NettyHow to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor Netty
VMware Tanzu
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
Brendan Gregg
 
Document of Turbo ++ project|| Railway Reservation System project
Document of Turbo ++  project|| Railway Reservation System projectDocument of Turbo ++  project|| Railway Reservation System project
Document of Turbo ++ project|| Railway Reservation System project
Jiangxi University of Science and Technology (江西理工大学)
 
IRJET- Implementation and Simulation of Failsafe Network Architecture
IRJET-  	  Implementation and Simulation of Failsafe Network ArchitectureIRJET-  	  Implementation and Simulation of Failsafe Network Architecture
IRJET- Implementation and Simulation of Failsafe Network Architecture
IRJET Journal
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019
Brendan Gregg
 
Durgesh
DurgeshDurgesh
Durgesh
dkbossverma
 
Container: is it safe enough to run you application?
Container: is it safe enough to run you application?Container: is it safe enough to run you application?
Container: is it safe enough to run you application?
Aleksey Zalesov
 

Similar to Job lifecycle (20)

Reproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and NextflowReproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and Nextflow
 
Designof traffic isolationby using flow based tunneling
Designof traffic isolationby using flow based tunnelingDesignof traffic isolationby using flow based tunneling
Designof traffic isolationby using flow based tunneling
 
YANG model for NETCONF Event Notifications
YANG model for NETCONF Event NotificationsYANG model for NETCONF Event Notifications
YANG model for NETCONF Event Notifications
 
Sunil
SunilSunil
Sunil
 
Basic onos-tutorial
Basic onos-tutorialBasic onos-tutorial
Basic onos-tutorial
 
Method and apparatus for intelligent management of a network element
Method and apparatus for intelligent management of a network elementMethod and apparatus for intelligent management of a network element
Method and apparatus for intelligent management of a network element
 
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
 
Chasing the optimizer
Chasing the optimizerChasing the optimizer
Chasing the optimizer
 
How to use choice component
How to use choice componentHow to use choice component
How to use choice component
 
Mule-choice component
Mule-choice componentMule-choice component
Mule-choice component
 
Reconnaissance of Virtio: What’s new and how it’s all connected?
Reconnaissance of Virtio: What’s new and how it’s all connected?Reconnaissance of Virtio: What’s new and how it’s all connected?
Reconnaissance of Virtio: What’s new and how it’s all connected?
 
2015.07.16 Способы диагностики PostgreSQL
2015.07.16 Способы диагностики PostgreSQL2015.07.16 Способы диагностики PostgreSQL
2015.07.16 Способы диагностики PostgreSQL
 
How to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor NettyHow to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor Netty
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
 
Document of Turbo ++ project|| Railway Reservation System project
Document of Turbo ++  project|| Railway Reservation System projectDocument of Turbo ++  project|| Railway Reservation System project
Document of Turbo ++ project|| Railway Reservation System project
 
IRJET- Implementation and Simulation of Failsafe Network Architecture
IRJET-  	  Implementation and Simulation of Failsafe Network ArchitectureIRJET-  	  Implementation and Simulation of Failsafe Network Architecture
IRJET- Implementation and Simulation of Failsafe Network Architecture
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019
 
Durgesh
DurgeshDurgesh
Durgesh
 
Container: is it safe enough to run you application?
Container: is it safe enough to run you application?Container: is it safe enough to run you application?
Container: is it safe enough to run you application?
 

More from Nuno Ferreira

Introduction to HPC Cloud Computing
Introduction to HPC Cloud ComputingIntroduction to HPC Cloud Computing
Introduction to HPC Cloud Computing
Nuno Ferreira
 
HPC Cloud - SURF Research Boot Camp
HPC Cloud - SURF Research Boot CampHPC Cloud - SURF Research Boot Camp
HPC Cloud - SURF Research Boot Camp
Nuno Ferreira
 
Thesis.20101115
Thesis.20101115Thesis.20101115
Thesis.20101115
Nuno Ferreira
 
The grid aprimer
The grid aprimerThe grid aprimer
The grid aprimer
Nuno Ferreira
 
Support.services.v2
Support.services.v2Support.services.v2
Support.services.v2
Nuno Ferreira
 
Support.services.v1
Support.services.v1Support.services.v1
Support.services.v1
Nuno Ferreira
 
Support.services.4.sg.developers
Support.services.4.sg.developersSupport.services.4.sg.developers
Support.services.4.sg.developers
Nuno Ferreira
 
Egi users ucf2012
Egi users ucf2012Egi users ucf2012
Egi users ucf2012
Nuno Ferreira
 
Egi.tf.2010
Egi.tf.2010Egi.tf.2010
Egi.tf.2010
Nuno Ferreira
 
Egi.cf.2014
Egi.cf.2014Egi.cf.2014
Egi.cf.2014
Nuno Ferreira
 
App db egi.tf.2013.v2
App db egi.tf.2013.v2App db egi.tf.2013.v2
App db egi.tf.2013.v2
Nuno Ferreira
 
20110712.we nmr.utrecht
20110712.we nmr.utrecht20110712.we nmr.utrecht
20110712.we nmr.utrecht
Nuno Ferreira
 

More from Nuno Ferreira (12)

Introduction to HPC Cloud Computing
Introduction to HPC Cloud ComputingIntroduction to HPC Cloud Computing
Introduction to HPC Cloud Computing
 
HPC Cloud - SURF Research Boot Camp
HPC Cloud - SURF Research Boot CampHPC Cloud - SURF Research Boot Camp
HPC Cloud - SURF Research Boot Camp
 
Thesis.20101115
Thesis.20101115Thesis.20101115
Thesis.20101115
 
The grid aprimer
The grid aprimerThe grid aprimer
The grid aprimer
 
Support.services.v2
Support.services.v2Support.services.v2
Support.services.v2
 
Support.services.v1
Support.services.v1Support.services.v1
Support.services.v1
 
Support.services.4.sg.developers
Support.services.4.sg.developersSupport.services.4.sg.developers
Support.services.4.sg.developers
 
Egi users ucf2012
Egi users ucf2012Egi users ucf2012
Egi users ucf2012
 
Egi.tf.2010
Egi.tf.2010Egi.tf.2010
Egi.tf.2010
 
Egi.cf.2014
Egi.cf.2014Egi.cf.2014
Egi.cf.2014
 
App db egi.tf.2013.v2
App db egi.tf.2013.v2App db egi.tf.2013.v2
App db egi.tf.2013.v2
 
20110712.we nmr.utrecht
20110712.we nmr.utrecht20110712.we nmr.utrecht
20110712.we nmr.utrecht
 

Job lifecycle

  • 1. /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira 20100127 @ uu.nl Life-cycle of a Grid Computing Job with some side stories
  • 2. 1/24 Outline  Grid & Science - EGEE  Virtual Organizations  enmr.eu architecture  Grid Job Life Cycle  Hello Grid!  CNS tutorial  Web Portals
  • 3. 2/24 The Grid “Coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations”. Foster, I. et al., Int. J. Superc. Appli. (2000)15:3
  • 4. 3/24 Why do scientists need the Grid? High-energy physics (15 PB/year) 15 PB ~ 20*10^6 CD’s Genome projects, data mining, Tackling the protein folding, Protein structure, …
  • 5. 4/24 Enabling Grids for E-science GStat (Jan 2010) : http://goc.grid.sinica.edu.tw/gstat/ Infrastructure  317 sites  58 countries  ~ 140K CPU’s 24/7  ~ 69 PB disk Users  182 registered VO’s  ~ 12K registered users  > 300K jobs / day
  • 6. 5/24 Registered EGEE Virtual Organizations Application domain Active VO’s Users High-energy Physics 41 4737 Infrastructures 28 2365 Life Sciences 10 519 ... ... ... Total 182 11908 http://cic.gridops.org/index.php?section=home&page=volist VO name Scope Registered Users (20090210) Registered Users (20100125) biomed Gobal 223 257 enmr.eu Global 54 155
  • 8. 7/24 How to become an enmr.eu user? http://ca.dutchgrid.nl/request/ /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Your Name
  • 11. The (not so short) Job Life-cycle 10/24www.gridcafe.org
  • 12. Authentication and Authorization (1/2) 11/24 [nuno@ui-enmr ~]$ ll ~/.globus total 16 -rw-r--r-- 1 nuno users 2189 Nov 14 17:18 usercert.p12 -rw-r--r-- 1 nuno users 4947 Nov 14 17:19 usercert.pem -rw------- 1 nuno users 963 Nov 14 17:20 userkey.pem [nuno@ui-enmr ~]$ voms-proxy-init --voms enmr.eu Cannot find file or dir: /home/nuno/.glite/vomses Enter GRID pass phrase: Your identity: /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira Creating temporary proxy ........................... Done Contacting voms-02.pd.infn.it:15014 [/C=IT/O=INFN/OU=Host/L=Padova/CN=voms-02.pd.infn.it] "enmr.eu" Done Creating proxy .......................... Done Your proxy is valid until Wed Jan 27 03:44:48 2010 [nuno@ui-enmr ~]$ grid-cert-info -s -i -sd -ed /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira /C=NL/O=NIKHEF/CN=NIKHEF medium-security certification auth Oct 23 00:00:00 2009 GMT Oct 23 15:15:43 2010 GMT
  • 13. Authentication and Authorization (2/2) 12/24 [nuno@ui-enmr ~]$ voms-proxy-init --voms enmr.eu Cannot find file or dir: /home/nuno/.glite/vomses Enter GRID pass phrase: Your identity: /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira Creating temporary proxy ............................................... Done Contacting voms2.cnaf.infn.it:15014 [/C=IT/O=INFN/OU=Host/L=CNAF/CN=voms2.cnaf.infn.it] "enmr. Creating proxy ............................................. Done Your proxy is valid until Wed Jan 27 03:54:00 2010 [nuno@ui-enmr ~]$ voms-proxy-info subject : /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira/CN=pr issuer : /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira identity : /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira type : proxy strength : 1024 bits path : /tmp/x509up_u500 timeleft : 11:56:19
  • 14. Available resources 14/24 [nuno@ui-enmr bcbr]$ lcg-infosites --vo enmr.eu ce all #CPU Free Total Jobs Running Waiting ComputingElement ---------------------------------------------------------- 399 20 85 57 28 grid-ce-01.ba.infn.it:2119/jobmanager-lcgpbs-short 16 7 9 9 0 ce-enmr.chem.uu.nl:2119/jobmanager-lcgpbs-medium 88 88 0 0 0 glite-ce.grid.uj.ac.za:8443/cream-pbs-long 2460 906 103 103 0 trekker.nikhef.nl:2119/jobmanager-pbs-medium 1632 1584 45 45 0 deimos.htc.biggrid.nl:2119/jobmanager-pbs-medium 200 0 0 0 0 t2-ce-05.lnl.infn.it:8443/cream-lsf-enmr1 … snip … Avail Space(Kb) Used Space(Kb) Type SEs ---------------------------------------------------------- 2444576886 555136905 n.a prod-se-01.pd.infn.it 3127661680 1371977164 n.a prod-se-02.pd.infn.it 1858674692 106001211 n.a se-enmr.chem.uu.nl 13828076063 21152016643 n.a se01.dur.scotgrid.ac.uk … snip …
  • 15. Submit a job 15/24 [nuno@ui-enmr bcbr]$ glite-wms-job-submit -a -o jid hello.jdl Connecting to the service https://wms-enmr.chem.uu.nl:7443/glite_wms_wmproxy_server ====================== glite-wms-job-submit Success ====================== The job has been successfully submitted to the WMProxy Your job identifier is: https://lb-enmr.chem.uu.nl:9000/gOtqQuG4ebqpz3m5z8_2Eg The job identifier has been saved in the following file: /home/nuno/grid/hello/bcbr/jid ==========================================================================
  • 16. Query Job Status 16/24 [nuno@ui-enmr bcbr]$ glite-wms-job-status -i jid ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://lb-enmr.chem.uu.nl:9000/gOtqQuG4ebqpz3m5z8_2Eg Current Status: Scheduled Status Reason: Job successfully submitted to Globus Destination: pbs-enmr.cerm.unifi.it:2119/jobmanager-lcgpbs-verylong Submitted: Tue Jan 26 16:26:07 2010 CET ************************************************************* [nuno@ui-enmr bcbr]$ glite-wms-job-status -i jid ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://lb-enmr.chem.uu.nl:9000/gOtqQuG4ebqpz3m5z8_2Eg Current Status: Done (Success) Exit code: 0 Status Reason: Job terminated successfully Destination: pbs-enmr.cerm.unifi.it:2119/jobmanager-lcgpbs-verylong Submitted: Tue Jan 26 16:26:07 2010 CET *************************************************************
  • 17. Retrieve Job Output 17/24 [nuno@ui-enmr bcbr]$ glite-wms-job-output -i jid --dir ./out Connecting to the service https://wms-enmr.chem.uu.nl:7443/glite_wms_wmproxy_server ================================================================================ JOB GET OUTPUT OUTCOME Output sandbox files for the job: https://lb-enmr.chem.uu.nl:9000/gOtqQuG4ebqpz3m5z8_2Eg have been successfully retrieved and stored in the directory: /home/nuno/grid/hello/bcbr/out ================================================================================ [nuno@ui-enmr bcbr]$ ll ./out/ total 4 -rw-r--r-- 1 nuno users 0 Jan 26 17:31 hello.err -rw-r--r-- 1 nuno users 48 Jan 26 17:31 hello.out [nuno@ui-enmr bcbr]$ more ./out/hello.out Hello Grid! I was here : wn3-enmr.cerm.unifi.it
  • 18. CNS example (1/3) 18/24 [nuno@ui-enmr cns-example]$ ll total 160 -rw-r--r-- 1 nuno users 144884 Mar 18 2009 cns-input.tgz -rw-r--r-- 1 nuno users 1529 Mar 18 2009 README -rwxr-xr-x 1 nuno users 134 Mar 18 2009 run-cns -rw-r--r-- 1 nuno users 229 Jan 17 17:58 run-cns.jdl [nuno@ui-enmr cns-example]$ tar tvzf cns-input.tgz -rw-r--r-- abonvin/staff 30070 2008-05-06 12:42:33 CaMM13Tmpcs1.tbl -rw-r--r-- abonvin/staff 16946 2008-05-06 12:42:33 CaMM13Tmrdc1.tbl -rw-r--r-- abonvin/staff 912 2008-05-06 12:44:53 README -rw-r--r-- abonvin/staff 208142 2008-05-06 12:42:33 calmodulin-MM13.pdb -rw-r--r-- abonvin/staff 341327 2008-05-06 12:42:33 calmodulin-MM13.psf -rw-r--r-- abonvin/staff 4982 2008-05-06 12:42:33 ion.param -rw-r--r-- abonvin/staff 158398 2008-05-06 12:42:33 noes.tbl -rw-r--r-- abonvin/staff 548 2008-05-06 12:42:33 par_axis.pro -rw-r--r-- abonvin/staff 74090 2008-05-06 12:42:33 parallhdg5.3.pro -rw-r--r-- abonvin/staff 16549 2008-05-06 12:42:33 phipsi.tbl -rw-r--r-- abonvin/staff 9571 2008-05-06 12:42:33 sa-test.inp -rw-r--r-- abonvin/staff 273 2008-05-06 12:42:33 tensor.pdb -rw-r--r-- abonvin/staff 1181 2008-05-06 12:42:33 tensor.psf -rw-r--r-- abonvin/staff 57 2008-05-06 12:42:33 tensor.tbl http://www.enmr.eu/eNMR-tutorials
  • 19. CNS example (2/3) 19/24 [nuno@ui-enmr cns-example]$ more run-cns source $VO_ENMR_EU_SW_DIR/BCBR/cns/1.2-para/set_cns.bash tar xfz cns-input.tgz cns < sa-test.inp > sa-test.out tar cvfz cns-output.tgz * [nuno@ui-enmr cns-example]$ more run-cns.jdl Executable = "run-cns"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = {"cns-input.tgz","run-cns"}; OutputSandbox = {"std.out", "std.err","cns-output.tgz"}; Requirements = RegExp ("chem.uu.nl",other.GlueCEUniqueId);
  • 20. CNS example (3/3) 20/24 [nuno@ui-enmr cns-example]$ glite-wms-job-submit -a -o jid run-cns.jdl [nuno@ui-enmr cns-example]$ glite-wms-job-output -i jid –dir ./ [nuno@ui-enmr cns-example]$ ll total 24464 -rw-r--r-- 1 nuno users 144884 Mar 18 2009 cns-input.tgz -rw-r--r-- 1 nuno users 24854174 Jan 26 18:24 cns-output.tgz -rw-r--r-- 1 nuno users 79 Jan 26 17:13 jid -rw-r--r-- 1 nuno users 1529 Mar 18 2009 README -rwxr-xr-x 1 nuno users 137 Jan 26 17:12 run-cns -rw-r--r-- 1 nuno users 229 Jan 17 17:58 run-cns.jdl [nuno@ui-enmr out]$ more sa_1.pdb REMARK FILENAME="/home/enmr016/globus-tmp.wn23-enmr.25892.0/https_3a_2f_2flb-" … snip … REMARK DATE:26-Jan-2010 17:29:14 created by user: enmr016 REMARK VERSION:1.2 ATOM 1 HA ALA 1 1.868 27.047 -8.664 1.00 15.00 A ATOM 2 CB ALA 1 0.511 28.488 -7.902 1.00 15.00 A ATOM 3 HB1 ALA 1 0.379 28.981 -8.854 1.00 15.00 A … snip …
  • 21. 21/24 Web Portal Grid Interaction
  • 22. On-going work – GROMACS WebPortal 22/24
  • 23. Zwartkijken / Idées Noires - Franquin “Life cycle of a GRID computing job? That's something like: conception.., abortion.., conception.., birth.., premature death.., reanimation.., etc? :p T.” 20100127 – 11AM 23/24
  • 24. 24/24 Big-Picture layer /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Alexandre Bonvin Rolf Boleans Hardware-layer /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Johan van der Zwan Middleware layer /C=IT/O=INFN/OU=Personal Certificate/L=Padova/ CN=Cristina Aiftimiei Application layer /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/ CN=Marc van Dijk /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/ CN=Sjoerd De Vries /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/ CN=Tsjerk Wassenaar User layer /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/ CN=*.* Acknowlegments
  • 26. Grid Operations Center Data Base 26/24
  • 27. Building a Grid 27/2427/24 1. The architecture 2. The hardware 3. The middleware Network Resources Middleware Application User-centric