ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
Conditions Database System for COMPASS Experiment
1. Computer Physics Communications 152 (2003) 135–143
www.elsevier.com/locate/cpc
Conditions database system of the COMPASS experiment
T. Toeda a,∗, M. Lamannab, V. Duicc, A. Manara d,1
a Nagoya University, Furo-cho, Chikusa-ku, 464-8602 Nagoya, Japan
b CERN, 1211 Geneve 23, Switzerland
c INFN Trieste, Sezione di Trieste, e Università di Trieste, Via A. Valerio, 2, I-34127 Trieste, Italy
d Università di Torino, Via P. Giuria, 1, I-10125 Torino, Italy
Received 15 April 2002; received in revised form 15 July 2002
Abstract
The CERN SPS experiment COMPASS has integrated a Conditions Database System in its off-line software. The system is
used to manage time-dependent information, detector condition, calibration, and geometrical alignment information, by using
a package provided by the CERN IT/DB. This integrated system consists of administration tools, a data handling library, and
data transfer software from the detector control system to the Conditions Database. In this paper, the status of the Conditions
Database project is described, and the results of the performance test on the COMPASS computing farm are given.
2002 Elsevier Science B.V. All rights reserved.
PACS: 07.05.Kf; 07.05.-t; 07.05.Bx; 29.85.+c
Keywords: High energy physics; Conditions database; Objectivity/DB; SCADA; PVSS
1. Introduction
COMPASS (COmmonMuon Proton Apparatus for
Structure and Spectroscopy) [1] is a fixed-target ex-periment
with an extensive physics programme at the
CERN SPS. The apparatus performs a number of dif-ferent
measurements, in different configurations, no-tably
using both muon and hadron beams in the 100–
300 GeV range at very high intensities. The COM-PASS
experiment started taking data in Summer 2001.
* Corresponding author.
E-mail address: eda@kiso.phys.nagoya-u.ac.jp (T. Toeda).
1 Present address: ITU, Place des Nations, CH-1211 Geneva 20,
Switzerland.
The large volume of data to be processed (during
a few months each year) and the need for a flexible
software environment to cope with the experiment’s
different configurations and measurements pushed the
COMPASS Collaboration to design the off-line analy-sis
software from scratch and build a dedicated facil-ity
for the off-line computing, namely the COMPASS
Computing Farm (CCF) [2].
The off-line system has to meet severe design
constraints, namely the high data acquisition rate
(35MB/s) and the very large data sample (10 G events,
30 kB each, 300 TB/y) to be reconstructed virtually
on-line.
COMPASS decided to use the CERN Central Data
Recording (CDR) System [3] to record all the data: the
on-line system does not write events on tape at the ex-periment
site, but sends them over a few kilometres of
0010-4655/02/$ – see front matter 2002 Elsevier Science B.V. All rights reserved.
doi:10.1016/S0010-4655(02)00817-2
2. 136 T. Toeda et al. / Computer Physics Communications 152 (2003) 135–143
dedicated optical fibre network to the computer centre,
where the CCF, the tape servers, and the correspond-ing
high-speed tape drives are located.
The estimated computing power to reconstruct all
the events at the speed of the data acquisition is 2000
Spec INT95, which is provided by some 100 Linux
Dual CPU PCs. The choice of network technology is
Gigabit and Fast Ethernet. A disk pool of a few TB has
been set up, initially made up of SCSI disks, but more
recently using the less expensive EIDE disks.
The COMPASS off-line analysis group has devel-oped
a Conditions Database (called CDB in the lit-erature)
System to manage detector condition, cali-bration
and geometrical alignment information, using
the CDB package developed at CERN in the IT/DB
Group.
In this paper, the COMPASS computing environ-ment
is introduced. Then the conception and design of
all the software components of the CDB system are
described. The results of several performance tests are
given, together with some investigations on the subject
of remote access via wide-area network.
2. COMPASS computing environment
The COMPASS experiment has an extensive pro-gramme
of physics, ranging from studies of the struc-ture
of the nucleon to spectroscopy studies. Common-alities
of the all proposed measurements are the high
trigger rate and the long acquisition time: the Level-1
trigger rate ranges between 10–100 kHz with a typical
event size of the order of 30 kB. When the Level-1 trig-ger
rate is of the order of 100 kHz (spectroscopy stud-ies),
a second level filter will be put in place to reduce
the data rate to tape to an effective 10 kHz. This high
event rate is a big challenge for the experimental appa-ratus
and the corresponding computing environment.
Fig. 1 shows the structure of the COMPASS com-puting
environment, which is composed of two parts:
the on-line computing farm located close to the ex-periment
and the off-line computing farm in the com-puter
centre. A Gigabit Ethernet line connects these
two parts.
The on-line computing farm is used for collecting
and building the event data in the data acquisition
Fig. 1. Structure of the computing environment.
3. T. Toeda et al. / Computer Physics Communications 152 (2003) 135–143 137
Fig. 2. Schematic view of CORAL.
systems [4] by using DATE [5] which is a multi-processor
distributed environment.
All of the detector system is under a slow control
system, which accesses all the detectors by the ser-ial
bus, and monitors the status. The system is im-plemented
by PVSS [6], which is one of commercial
SCADA (Supervisory Control And Data Acquisition)
system. The controlled status of the detectors is kept
in a proprietary database.
An off-line computing farm, CCF, is used for off-line
event reconstruction, monitoring and analysis,
consisting of 100 Dual-CPU Linux-PCs with load
balance using LSF (Load Sharing Facility) [7]. In
addition, several Gigabit-Ethernet servers are used for
receiving the data from the on-line farm to populate
the databases and as database servers.
The transferred event data to the off-line farm are
kept in Objectivity/DB [8] databases which are di-rectly
created in the namespace of CASTOR (CERN
Advanced Storage Manager [9]). The database can
be accessed from the disk image at the beginning,
whilst CASTOR ensures that the file is copied onto the
tape simultaneously. The COMPASS off-line analysis
group has developed the COMPASS off-line recon-struction
and analysis framework, i.e. CORAL [10]
by using C++ and Object-Oriented programming tech-niques.
CORAL has been designed to have a modular ar-chitecture
as shown in Fig. 2 and provides all the ba-sic
functions, such as initialization, data input/output,
error logging, and so on. Moreover, the external pack-ages,
ROOT [11], Objectivity/DB and HBOOK [12]
are available via insulation layers. This choice ensures
good flexibility in case of change of the external pack-ages.
The same strategy has been adopted for particle
track reconstruction components with well-defined in-terfaces
for each component.
3. COMPASS Conditions Database system
A package for the implementation of an experi-mental
Conditions Database was originally proposed
and developed in BaBar [13] at SLAC to be used for
physics events data analysis. Fig. 3 shows the basic
idea of the Conditions Database. Each experimental
condition quantity is represented by a persistent ob-ject
in the Object Database with a validity time inter-val.
Whenever a new calibration object is created, it is
stored with its validity time interval (assigned by the
user). In case this overwrites a previous calibration,
the old calibration is still kept for comparison and the
4. 138 T. Toeda et al. / Computer Physics Communications 152 (2003) 135–143
Fig. 3. Concept of the Conditions Database. Each box represents a condition object and the last layer is the most recent.
Fig. 4. Hierarchical structure of the Conditions Database API of the CERN edition.
new one becomes the default one. This mechanism is
called condition versioning.
Given the interest in the CERN HEP community, a
new package called CDB was developed by the CERN
IT-DB Group [14]. A hierarchical structure (inspired
by the UNIX file system) is available in the CERN
edition. Fig. 4 shows a schematic view of the structure.
The folder-set and folder correspond to a directory and
a file in the UNIX file system, respectively. Presently,
the CERN CDB API is implemented using the Objec-tivity/
DB as the database engine. The COMPASS col-laboration
has developed a Conditions Database Sys-tem
in the off-line system by using the CDB API of the
CERN edition. The COMPASS system has three main
components: the administration tool, the handling li-brary,
and the data transfer program.
3.1. Administration tools of the conditions database
The administration tools offer management func-tions
and interactive functions for the administrator
and the general user, respectively. The main functions
allow new databases, folder-sets and folders in the
CDB to be created. Users can only create folders and
browse the list of folder-sets and folders, scan the data
in a given time interval, and draw the value versus
time.
All the administration tools have the same struc-ture.
The interaction with the user is done by Perl
scripts which handle the parameters, check the consis-tency
of the request and the permissions of the action,
and finally execute the command (a C++ executable);
these scripts also collect and present the results. The
5. T. Toeda et al. / Computer Physics Communications 152 (2003) 135–143 139
C++ executables perform the real interaction with the
CDB such as loading new data, retrieving the calibra-tion
for a given time (interval), browsing, or altering
the folder set structure. A shell front-end command
for the CDB has been developed, which resembles a
UNIX shell. It is possible for users to interact with the
CDB with ls, cd, and so on. Fig. 5 shows an extract of
a user session.
3.2. Conditions Database Handling Library for the
event reconstruction program
3.2.1. Implementation of the handling library
The event reconstruction program needs a set of de-tector
calibration information for event reconstruction.
Some of the calibration sets might change while the
analysis proceeds, for example when better calibra-
[lxplus061] ~ > cdb_sh
Wait 5 sec...
"help" command is available
/COMPASS >help
ls : print list in current directory
ls -R : print all
ls -c : print only container
cd <dir> : change directory
pwd : print current directory
scan <container> : print time intervals;
plot_pvss <container> : plot the graph of PVSS data
exit or Ctrl-C : exit
/COMPASS >ls
drw-r----- admin vy PA:/COMPASS/PA
drw-r----- admin vy PB:/COMPASS/PB
drw-r----- admin vy PS:/COMPASS/PS
drw-r----- admin vy DC:/COMPASS/DC
drw-r----- admin vy MM:/COMPASS/MM
drw-r----- admin vy GM:/COMPASS/GM
drw-r----- admin vy ST:/COMPASS/ST
drw-r----- admin vy SI:/COMPASS/SI
drw-r----- admin vy FI:/COMPASS/FI
drw-r----- admin vy MA:/COMPASS/MA
drw-r----- admin vy MB:/COMPASS/MB
drw-r----- admin vy HC:/COMPASS/HC
drw-r----- admin vy RI:/COMPASS/RI
drw-r----- admin vy HO:/COMPASS/HO
drw-r----- admin vy HM:/COMPASS/HM
drw-r----- admin vy HI:/COMPASS/HI
drw-r----- admin vy HL:/COMPASS/HL
drw-r----- admin vy VT:/COMPASS/VT
drw-r----- admin vy GEOM:/COMPASS/GEOM
drw-r----- admin vy CDB:/COMPASS/pvss
drw-r----- admin vy BM:/COMPASS/BM
drw-r----- admin vy CDB:/COMPASS/RUN
/COMPASS >
Fig. 5. cdb_sh example: in the output of ls, the admin and vy are user and group ID, respectively. The string of the last column is the name of
the database and the path of the folder-set. Different names like GEOM or CDB refer to physically distinct databases.
6. 140 T. Toeda et al. / Computer Physics Communications 152 (2003) 135–143
Fig. 6. The sequence diagram for event reconstruction. Before the reconstruction, the required conditions are read from the CDB to the CORAL
cache and the transaction is closed (all locks are released), then the reconstruction is started. The conditions are read from the cache in the
reconstruction phase without accessing the CDB any more.
tion procedures can be put in place. The connection
between CORAL and the CDB is done in the handling
library that isolates the CDB layer and takes care of
the communication.
There are two important requirements for the li-brary.
The first one is to offer interface methods for ac-cessing
the right calibration information for the event
reconstruction module in the framework. The second
one is the conditions caching mechanism in the mem-ory.
The CDB itself does not provide much caching,
and therefore the communication between CORAL
and the CDB could last for the entire duration of the
reconstruction. In the case of concurrent running of
many reconstruction jobs, the CDB could always be
opened in reading mode. Owing to the locking mecha-nism
of Objectivity/DB, although multiple readers are
always accepted, new calibrations could not be entered
in the CDB. Since the time interval of the calibration
is known in advance, all required calibrations can be
read and the database is then closed before the recon-struction.
This minimizes the load on the CDB.
Fig. 7. Structure of the CDB module. CDB is the super-class to
declare the interface methods, while CDB Handler and FileDB are
sub-classes to define the implementation.
Fig. 6 shows the reconstruction scenario. Before the
actual reconstruction starts, all the required conditions
must be read into the handled cache, which will be
7. T. Toeda et al. / Computer Physics Communications 152 (2003) 135–143 141
used to reconstruct the events. Clearly this is feasible
in an off-line system, where the conditions are frozen
at the time of processing (a new set of calibrations will
be used for a full reprocessing).
Fig. 7 shows the class diagram of the library,
which is constructed with one super-class and two sub-classes.
The common interface is declared in the su-perclass,
while the actual implementations are defined
in the sub-classes. The first sub-class, FileDB, reads
the data from calibration tables stored on a network
file system, AFS. The second sub-class, CDB handler,
reads the calibrations from the Objectivity/DB.
3.2.2. Performance test
The performance tests try to reproduce, and push
to the limit, with the condition expected during CCF
operation. During such operation, physics event data
sets will be analyzed in parallel by concurrent jobs,
each new set being read and processed by some 20
jobs; the peak activity on the CDB is at the start of
each set, when the calibration constants are loaded by
each job.
To mimic this situation, the following set-up was
used: one single database server hosts all the calibra-tion
data (10 databases) and these data are accessed
by an increasingly higher number of jobs. The jobs
are strictly synchronized to the database initialization.
Any reading program read the same calibration set.
The Advanced Multi-threaded Server (AMS) provides
remote access to the Objectivity/DB databases.
A second machine (a Sun 5 workstation with a
Fast Ethernet connection) keeps all the central data-base
management information (database catalogue
and schema) and grants the locks to regulate the data-base
access.
Figs. 8 and 9 summarize a full series of tests (from
1 to 100 clients); the size of the data read by each client
is 100 MB and the AMS is running with 32 threads.
In Fig. 8(a) the number of active clients is shown as
a function of time with up to 100 jobs being launched
simultaneously. In Fig. 8(b) the output network traffic
measured at the Ethernet card of the database server is
shown as a function of time.
Fig. 8(c) shows the speed of the database server as a
function of the number of active clients, evaluated by
Fig. 8. Status of database server. The figure (a) shows the number of concurrent clients. The figure (b) shows the outgoing data size from the
network card of the AMS server. The figure (c) shows the evaluated speed of the DB server.
Fig. 9. CPU consumption of the AMS server: (a) Standard, and (b) CERN edition. The shaded area indicates a part of the system (Linux kernel)
consumption of the total CPU.
8. 142 T. Toeda et al. / Computer Physics Communications 152 (2003) 135–143
measuring the read time of each client and summing
over all the active clients. Similar performance is
observed when varying the number of AMS threads
(from four to the allowed maximum of 32) and with a
data size of 10 MB.
In Fig. 9(a) the total CPU consumption of the AMS
is shown as a function of time; the shaded area of the
plot shows the CPU used by the system. Note that
when the number of concurrent clients exceeds 10, the
AMS reaches ≈100% CPU utilization of both CPUs.
The CERN-IT/DB Group provided a modified
AMS, which is capable of transparently interacting
with the mass storage system and handles more
efficiently the case where many database files are open
at the same time. The data rate obtained is similar, but
the CPU consumption is considerably lower, as shown
in Fig. 9(b).
When the CDB is read, the typical working point
for the CCF is lower than 20, because typically
20 jobs are started at the same time, and they are
randomized by the batch system. The CDB access is
terminated well before the next batch of jobs starts.
Since the CDB data are written in different files, more
efficient access can be reached by placing these files
on different data servers if necessary, but this is not
necessary in the present set-up.
3.3. Data transfer system from SCADA to the
Conditions Database
The COMPASS experiment chose PVSS as the
SCADA software for the Detector Slow Control.
PVSS is based on client-server architecture and per-forms
tasks by several independent components called
managers. PVSS also has an internal archive to tem-porarily
store the data. The requirement of an easy ac-cess
to the Slow Control data for the off-line analysis
suggests to transfer these data into an off-line data-base.
We devised a prototype software system for
transferring the data from the archive to the CDB [15].
The set-up that we used to study this system is
sketched in Fig. 10.
The CPU usage for various managers is shown in
Fig. 11 for three cycles during which 7 MB of data
is handled. The important feature of this system is
that the data transfer is controlled in a single database
transaction, which guarantees that no data are lost.
Fig. 10. PVSS set-up. The data are retrieved from the PVSS archive.
The Controller program contacts the Sender program that retrieves
the data and sends them to the off-line system. The Formatter
program stores the data in the CDB.
Fig. 11. Snapshot of the transferring test.
The test performed with mock data suggests that a
throughput of the order of 1 MB/s can be obtained.
Extrapolation for the COMPASS experiment, where
we expect as many as 20,000 Slow Control channels,
suggested a cyclic process, active about one minute in
every hour (transferring one hour of Slow Control data
from the PVSS archive and storing it into the CDB as
a STL string).
This transfer program was already used to transfer
the real data at the end of the experimental run of 2001.
4. Accessing the Conditions Database of
COMPASS overWide Area Network
From within CORAL code, it is possible to access
the Conditions Database over wide-area network. This
is needed to avoid to export all the Conditions Data-base
on remote site and to rely on other tools to guar-antee
the synchronization of the various replica. Two
9. T. Toeda et al. / Computer Physics Communications 152 (2003) 135–143 143
approaches have been followed, which use present
most popular distributed OOP paradigms.
RMI (Java Remote Method Invocation): As both
CORAL is implemented in C++, this technology
requires RMI to be interfaced to native code
through JNI (Java Native Interface). On the server
side, a Java RMI-capable object, implementing a
remote interface, is invoked, and actually manages
to retrieve the conditions data from the CDB,
through JNI, by means of a dynamically loaded
library which wraps the CDB library.On the client
side, CORAL code can instantiate by JNI a Java
Virtual Machine and a Java object that acts as a
RMI Client (invokes the remote methods of the
RMI capable object).
CORBA (COmmon Request Broker Architecture):
This architecture allows a more elegant and
straightforward approach. On the server side, a
CORBA object exposes its remote interface, im-plemented
in a C++ library linked class. On the
client side, from within CORAL, the exposed
methods of the remote CORBA object can be in-voked
like it was an ordinary C++ object once
its InteroperableObject Reference (IOR) has been
obtained.
Both approaches have been investigated and work-ing
prototypes exist.
5. Summary
This paper describes the COMPASS Conditions
Database System for off-line event reconstruction and
the detector slow control system, which consists of
three software packages: the administration tools, the
CDB handling library and the data transfer system
from PVSS to CDB. The system is already used in the
experiment and has now been improved.
Acknowledgements
We would like to thank all the members of the
COMPASS off-line group for valuable discussions
and the CERN IT Division for providing excellent
hardware and software support. T. T. would like to
thank Prof. N. Horikawa for his gratefully support and
all the members of PT Group in physics department of
Nagoya University.
References
[1] COMPASS proposal, CERN-SPSLC-96-14;
COMPASS addendum 1, CERN-SPSLC-96-30.
[2] M. Lamanna, The COMPASS Computing Farm project, in:
M. Mazzucato (Ed.), Proceedings of the CHEP 2000 Confer-ence,
Padova, February 2000, p. 576.
[3] CERN CDR home page, http://cdr.web.cern.ch/cdr/.
[4] H. Fischer, et al., Nucl. Instrum. Methods A 461 (2001) 507.
[5] H. Becker, et al., Interprocess and interprocessor data flow
in the KLOE data acquisition system, in: Proceeding of
Computing in High Energy Physics, 1995.
[6] PVSS Funktionbeschreibung, ETM GmbH, 2001;
PVSS home page, http://www.pvss.com.
[7] LSF Reference Guide, Platform Computing Corporation, June
2000;
LSF home page, http://www.platform.com.
[8] Objectivity/DB Technical Overview, Objectivity Inc., January
2001;
Objectivity/DB home page, http://www.objectivity.com.
[9] CASTOR homepage, http://wwwinfo.cern.ch/pdp/castor/
Welcome.html.
[10] A. Martin, Comput. Phys. Comm. 140 (2001) 82;
See also CORAL home page, http://coral.cern.ch/.
[11] R. Brun, F. Rademakers, Nucl. Instrum. Methods A 389 (1997)
81.
[12] R. Brun et al., HBOOK: User Guide, CERN-DD-75-11.
[13] I. Gaponenko, et al., An overview of the BaBar Conditions
Database, in: M. Mazzucato (Ed.), Proceedings of the CHEP
2000 Conference, Padova, February 2000, p. 406.
[14] Conditions Database of CERN edition home page, http:
//wwwinfo.cern.ch/db/objectivity/docs/conditionsdb/.
[15] M. Lamanna, A. Manara, Integration of the COMPASS Con-ditions
Database with the Slow Control software, submitted to
the COMPASS internal note; See also COMPASS CDB page;
http://wwwcompass.cern.ch/compass/software/offline/cdb/.