gLite Data Management System - Presentation Transcript
Architecture of the gLite Data Management System Leandro Neumann Ciuffo INFN-Catania (Italy) EELA-2 Tutorial Montevideo, 22.07.2009
Outline
Challenges of data management in a Grid infrastructure
Initial definitions
Types of Storage Elements
File naming conventions
File catalogue
Practical exercises (hands on)
Be prepared for a bunch of acronyms!
gLite DMS – EELA-2 Tutorial, 22.07.2009
Challenges
Heterogeneity
Data are stored on different storage systems using different access technologies
Distribution
Data are stored in different locations (in most cases there is no shared file system or common namespace)
Data need to be moved between different locations
Data description
Data are stored as files (need to describe and locate them according to their content)
gLite DMS – EELA-2 Tutorial, 22.07.2009 Storage Resource Manager interface File Catalogue File Transfer Service Metadata Service
Getting started
The Storage Element (SE) is the service which allows users and applications (programs) to store/retrieve data (files)
The DMS provide services for location, access and transfer of files
User do not need to know the file location, just its logical name.
Files can be replicated or transferred to several locations (SEs) as needed.
Files are shared within a VO
Files are write-once, read-many
Files cannot be changed unless remove or replaced
No intention of providing a global file management system
gLite DMS – EELA-2 Tutorial, 22.07.2009
Getting started
Files located in the Storage Elements (SEs)…
Are mostly write-once, read-many.
Accessible by users and applications from “anywhere” in the Grid.
Several replicas of one file can be replicated at different sites.
Cannot be changed unless remove or replaced.
Storage Elements (SEs)…
Provide storage space for files.
Provide transfer protocol (GSIFTP) ~ GSI based FTP server
Provide an interface for the management of disk and tape storage resources: Storage Resource Manager (SRM)
gLite DMS – EELA-2 Tutorial, 22.07.2009
Types of Storage Elements
dCache
Consists of a server and one or more pool nodes.
Centralized admin.: single point of access to the SE.
Files are presented in the disk pools under a single virtual filesystem tree.
Uses the GSI dCache Access Protocol (gsidcap).
CERN Advanced STORage manager (CASTOR)
Files are migrated from a disk buffer frontend to a tape mass storage
Uses the insecure Remote File I/O protocol (RFIO)
Disk Pool Manager (DPM)
Used for fairly small SEs (max 10 TB of total space) with disk-based storage only.
Uses secure RFIO protocol
gLite DMS – EELA-2 Tutorial, 22.07.2009
Storage Resource Manager (SRM) B C Worker Nodes A User Interface SE - CASTOR SE - DPM dCache submit read input read input store output gLite DMS – EELA-2 Tutorial, 22.07.2009 myJOB
Storage Resource Manager (SRM)
You as a user need to know all the systems!!!
SRM I talk to them on your behalf I will even allocate space for your files And I will use transfer protocols to send your files there SE CASTOR SE DPM SE dCache The SRM is a single interface that takes care of local storage interaction and provides a Grid interface to the outside world. gLite DMS – EELA-2 Tutorial, 22.07.2009
File Naming conventions (1)
Grid Unique IDentifier (GUID)
Every file has a GUID
A non-human-readable unique identifier, e.g.: guid:38ed3f60-c402-11d7-a6b0-f53ee5a37e1d
Note: all replicas of a file will share the same GUID
Logical File Name (LFN)
An a lias that can be used to refer to a file, e.g.: lfn://grid/gilda/users/mario/myfile.dat
gLite DMS – EELA-2 Tutorial, 22.07.2009 Logical File Name 1 Logical File Name N GUID ...
File Naming conventions (2)
Storage URL (SURL) or Physical File Name (PFN)
The location of an actual file on a storage system, e.g.: srm://aliserv6.ct.infn.it/dpm/home/gilda/project1/test.dat
Note: Used by the system to find where the replica is physically stored
Transport URL (TURL)
Complete URI with the necessary information to access a file in a SE (including the access protocol) e.g.: rfio://lxshare0209.cern.ch//data/alice/ntuples.dat
Logical File Name 1 Logical File Name N GUID ... ... Physical File SURL N Physical File SURL 1 TURL 1 TURL 1 ... gLite DMS – EELA-2 Tutorial, 22.07.2009
SRM interactions SRM
The client asks the SRM for the file providing an SURL
The SRM asks the Storage Element to provide the file
The Storage Element notifies the availability of the file and its location
The SRM returns a TURL (Transfer URL), i.e. the location from where the file can be accessed
The client interacts with the storage using the protocol specified in the TURL
Is the service which maintains mappings between LFN(s), GUID and SURL(s)
It keeps track of the location of copies (replicas) of files
It consists of a unique catalogue, where the LFN is the main key
Looks like a “top-level” directory in the Grid
For each of the supported VO a separate subdirectory exists under the "/grid" directory.
All members of a given VO have read-write permissions in such a directory
gLite DMS – EELA-2 Tutorial, 22.07.2009
The LFC Service User Interface SE B SE A SE C File Catalogue lfn:/grid/gilda/tcaland/mpi.txt gLite DMS – EELA-2 Tutorial, 22.07.2009
The LFC Service srm://host.example.com/foo/bar host.example.com /grid/dteam/dir1/dir2/file1.root LFN GUID 38ed3f60-c402-11d7 -a6b0… Replicas /grid/dteam/mydir/mylink Symlink Further LFNs can be added as symlinks to the main LFN. LCF key SURLs User Metadata System Metadata gLite DMS – EELA-2 Tutorial, 22.07.2009
Job submission – example 1 User Interface CE Worker Nodes WMS
Small files: InputSandbox / OutputSandbox
gLite DMS – EELA-2 Tutorial, 22.07.2009
Data Management – example 2 User Interface CE Worker Nodes WMS LFC SE SE gLite DMS – EELA-2 Tutorial, 22.07.2009
LFC commands
Interact with the catalogue only
gLite DMS – EELA-2 Tutorial, 22.07.2009 Add/replace a comment lfc-setcomment Set file/directory access control lists lfc-setacl Remove a file/directory lfc-rm Rename a file/directory lfc-rename Create a directory lfc-mkdir List file/directory entries in a directory lfc-ls Make a symbolic link to a file/directory lfc-ln Get file/directory access control lists lfc-getacl Delete the comment associated with the file/directory lfc-delcomment Change owner and group of the LFC file-directory lfc-chown Change access mode of the LFC file/directory lfc-chmod
lcg-utils commands
Copy files to/from/between SEs.
Keep the SEs and the Catalogue up to date.
The RPM containing these tools (lcg_util) is installed in the WNs and UIs.
gLite DMS – EELA-2 Tutorial, 22.07.2009 lcg-cp Copies a grid file to a local destination lcg-cr Copies a file to a SE and registers the file in the catalog lcg-del Delete one file lcg-rep Replication between SEs and registration of the replica lcg-gt Gets the TURL for a given SURL and transfer protocol lcg-sd Sets file status to “Done” for a given SURL in a SRM request
Copying and registering a file lcg-cr --vo <vo name> -l <LFN destination> -d <SE> <local file>
lcg-cr
Copies a file to a SE and registers the file in the catalogue
This command will return the GUID for your file
gLite DMS – EELA-2 Tutorial, 22.07.2009 Make sure to have a directory in the LFC ( /grid/gilda/users/sagrid/yourname/ ) Use the lcg-info or lcg-infosites commands to figure out the available SEs lcg-infosites --vo gilda se Avail Space(Kb) Used Space(Kb) Type SEs ---------------------------------------------------------- 1100000000 1145007 n.a gilda-se.rediris.es 1030000000 32 n.a fn2.hpcc.sztaki.hu 295250000 75945624 n.a aliserv6.ct.infn.it n.a 999999 n.a se-edu.grid.acad.bg 60440000 3280565 n.a iceage-se-01.ct.infn.it 1008437 8844236 n.a se.hpc.iit.bme.hu 53160000 440416 n.a vega-se.ct.infn.it 2430000000 440450 n.a se1-egee.srce.hr 97890000 440423 n.a dgt02.ui.savba.sk lcg-cr --vo gilda -l lfn:/grid/gilda/tutorials/ yourname/yourfile.txt -d aliserv6.ct.infn.it file://$HOME/alien.txt
Replicate a file between SEs lcg-rep --vo gilda -d gilda-se.rediris.es lfn:/grid/gilda/tutorials/ yourname/yourfile.txt
Basic Usage:
Try it:
lcg-rep --vo <vo name> -d <destination SE> <LFN of your file> gLite DMS – EELA-2 Tutorial, 22.07.2009
Listing the replicas
Use the same lcg-lr command used previously:
The command will return the SURL of all replicas
A file can be stored on multiple SE's so that a job can download it from the closest SE while is running.
0 comments
Post a comment