1
Ganga
An interface to the LHC computing grid
Matt Williams
University of Birmingham
2
CERN and the LHC
● Largest particle physics
experiment in the world
● 27km in circumference 
● Over 100m underground 
● Thousands of physicists
● 100s of petabytes of data
3
The Grid
4
GANGA
● ~2001 LHCb started GANGA, an in-house tool
– Specific to our needs
● By 2010 when the LHC turned on, it was used by many more
– ATLAS, NA62, T2K and many more smaller experiements
● Python had always been the obvious choice
– Used everywhere in Particle Physics (along with C++)
– Easy to create new plugins for experiments
● Can be scripted or with an IPython-based interactive console
● Open source, released as GPL (like most CERN software)
5
How is it used
j = Job(name = 'Example job')
j.application = Executable()
j.application.exe = File('test.sh')
j.outputfiles = [LocalFile('out.txt')]
j.backend = Local()
j.submit()
6
Retrieving results
In [1]: j.peek()
total 200
-rw-r--r-- 1 phrfbi lhcb 0 Jun 22 2013 __syslog__
-rw-r--r-- 1 phrfbi lhcb 141999 Jun 22 2013 stdout
-rw-r--r-- 1 phrfbi lhcb 53671 Jun 22 2013 stderr
-rw-r--r-- 1 phrfbi lhcb 2463 Jun 22 2013 out.txt
-rw-r--r-- 1 phrfbi lhcb 135 Jun 22 2013 __jobstatus__
In [2]: j.peek('out.txt')
7
Using the Grid
Just change backend from Local() to LCG()
Other backends are Interactive, PBS, LSF, SGE, Panda, Jedi, Dirac,
Condor, ARC, CREAM...
8
Input data and splitting
j = Job(name = 'Input splitter', backend = LCG())
j.application = Executable()
j.application.exe = File('analyse_data')
j.inputfiles = [LocalFile(f.strip())
for f in open('inputs.txt')]
j.splitter = SplitByFiles(filesPerJob = 10)
j.outputfiles = [LocalFile('histogram.root')]
j.submit()
9
Mergers
j = Job(name = 'Merger', backend = LCG())
j.application = Executable()
j.application.exe = File('analyse_data')
j.inputfiles = [LocalFile(f.strip())
for f in open('inputs.txt')]
j.splitter = SplitByFiles(filesPerJob = 10)
j.outputfiles = [LocalFile('histogram.root')]
j.merger = RootMerger(files = ['histogram.root'])
j.submit()
10
Job catalogue
In [1]: jobs
Out [1]:
fqid | status | name | subjobs | application | backend
----------------------------------------------------------------------
0 | completed | Example job | | Executable | Local
1 | running | Input splitter | 324 | Executable | LCG
2 | running | Merger | 324 | Executable | LCG
11
Full API access
In [2]: jobs(2).status
Out [2]: running
In [3]: len([j for j in jobs(2).subjobs if j.status == 'completed'])
Out [3]: 24
In [4]: for subjob in jobs(2).subjobs:
if subjob.status == 'failed':
subjob.resubmit()
Can define custom functions in ~/.ganga.py which will be available at runtime
12
Dealing with large files
j = Job(name = 'Large output', backend = Dirac())
j.application = Executable()
j.application.exe = File('analyse_data')
j.inputfiles = [DiracFile('input.root')]
j.outputfiles = [DiracFile('histogram.root')]
j.submit()
13
Find more at cern.ch/ganga
Download code from cern.ch/ganga/download/
Thank you

Ganga: an interface to the LHC computing grid

  • 1.
    1 Ganga An interface tothe LHC computing grid Matt Williams University of Birmingham
  • 2.
    2 CERN and theLHC ● Largest particle physics experiment in the world ● 27km in circumference  ● Over 100m underground  ● Thousands of physicists ● 100s of petabytes of data
  • 3.
  • 4.
    4 GANGA ● ~2001 LHCbstarted GANGA, an in-house tool – Specific to our needs ● By 2010 when the LHC turned on, it was used by many more – ATLAS, NA62, T2K and many more smaller experiements ● Python had always been the obvious choice – Used everywhere in Particle Physics (along with C++) – Easy to create new plugins for experiments ● Can be scripted or with an IPython-based interactive console ● Open source, released as GPL (like most CERN software)
  • 5.
    5 How is itused j = Job(name = 'Example job') j.application = Executable() j.application.exe = File('test.sh') j.outputfiles = [LocalFile('out.txt')] j.backend = Local() j.submit()
  • 6.
    6 Retrieving results In [1]:j.peek() total 200 -rw-r--r-- 1 phrfbi lhcb 0 Jun 22 2013 __syslog__ -rw-r--r-- 1 phrfbi lhcb 141999 Jun 22 2013 stdout -rw-r--r-- 1 phrfbi lhcb 53671 Jun 22 2013 stderr -rw-r--r-- 1 phrfbi lhcb 2463 Jun 22 2013 out.txt -rw-r--r-- 1 phrfbi lhcb 135 Jun 22 2013 __jobstatus__ In [2]: j.peek('out.txt')
  • 7.
    7 Using the Grid Justchange backend from Local() to LCG() Other backends are Interactive, PBS, LSF, SGE, Panda, Jedi, Dirac, Condor, ARC, CREAM...
  • 8.
    8 Input data andsplitting j = Job(name = 'Input splitter', backend = LCG()) j.application = Executable() j.application.exe = File('analyse_data') j.inputfiles = [LocalFile(f.strip()) for f in open('inputs.txt')] j.splitter = SplitByFiles(filesPerJob = 10) j.outputfiles = [LocalFile('histogram.root')] j.submit()
  • 9.
    9 Mergers j = Job(name= 'Merger', backend = LCG()) j.application = Executable() j.application.exe = File('analyse_data') j.inputfiles = [LocalFile(f.strip()) for f in open('inputs.txt')] j.splitter = SplitByFiles(filesPerJob = 10) j.outputfiles = [LocalFile('histogram.root')] j.merger = RootMerger(files = ['histogram.root']) j.submit()
  • 10.
    10 Job catalogue In [1]:jobs Out [1]: fqid | status | name | subjobs | application | backend ---------------------------------------------------------------------- 0 | completed | Example job | | Executable | Local 1 | running | Input splitter | 324 | Executable | LCG 2 | running | Merger | 324 | Executable | LCG
  • 11.
    11 Full API access In[2]: jobs(2).status Out [2]: running In [3]: len([j for j in jobs(2).subjobs if j.status == 'completed']) Out [3]: 24 In [4]: for subjob in jobs(2).subjobs: if subjob.status == 'failed': subjob.resubmit() Can define custom functions in ~/.ganga.py which will be available at runtime
  • 12.
    12 Dealing with largefiles j = Job(name = 'Large output', backend = Dirac()) j.application = Executable() j.application.exe = File('analyse_data') j.inputfiles = [DiracFile('input.root')] j.outputfiles = [DiracFile('histogram.root')] j.submit()
  • 13.
    13 Find more atcern.ch/ganga Download code from cern.ch/ganga/download/ Thank you