Published on

PPT on Data Management System

Published in: Business, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Service-oriented grid middleware
  • When user submit a job in Grid its status changes according to the following state machine
  • Slide inherited from EDG – European Data Grid
  • Larocca

    1. 1. The gLite WMS and theData Management SystemGiuseppe LA ROCCAINFN Cataniagiuseppe.larocca@ct.infn.itMaster Class for Life Science,4-6 May 2010Singapore
    2. 2. Outline• An introduction to the gLite WMS • Job Submission via WMS • Command line interface • Job status• The Job Description Language overview • JDL attributes• The gLite DMS – The Storage Resource Manager (SRM)• Grid file referencing schemes• LFC File Catalogue – Architecture – LFC commands• File & Replica Management Client Tools• Run bioinformatics applications via Grid portal
    3. 3. The gLite stack: overview
    4. 4. Overview of the WMS• The Workload Management System (WMS) is the gLite 3 component that allows users to submit jobs, and performs all tasks required to execute them, without exposing the user to the complexity of the Grid.• Workload Management System (WMS) comprises a set of Grid middleware components responsible for distribution and management of tasks across Grid resources. – The Workload Manager (WM) aims to accept and satisfy requests for job management coming from its clients. • WM will pass the job to an appropriate CE for execution taking into account requirements and the preferences expressed in the job description. • The decision of which resource should be used is the outcome of a matchmaking process. – The Logging and Bookkeeping service tracks jobs managed by the WMS. It collects events from many WMS components and records the status and history of the job.
    5. 5. Job Submission via WMS GILDA User Interface create proxy Grid Site Computing Element Storage Element VO Management Service(DB of VO users)
    6. 6. Job Submission via WMS GILDA User Interface Workload Information System Write JDL, Submit job Management (executable) + small inputs System query create proxy publish state Grid Site Computing Element Storage Element VO Management Service(DB of VO users)
    7. 7. Job Submission via WMS GILDA User Interface Workload Information System Write JDL, Submit job Management (executable) + small inputs System query create proxy publish Submit job state Logging Grid Site Computing Element Storage Element VO Management process Service(DB of VO users) Logging and bookkeeping
    8. 8. Job Submission via WMS GILDA User Interface Workload Information System Write JDL, Submit job Management (executable) + small inputs System query Retrieve status create & proxy (small) output files publish Submit job Retrieve state output Job Logging status Grid Site Computing Element Storage Element VO Management process Service(DB of VO users) Logging and bookkeeping
    9. 9. The Command Line Interface• The gLite WMS implements two different services to manage jobs: the Network Server and the WMProxy. – The recommended method to manage jobs is through the gLite WMS via WMProxy, because it gives the best performance and allows to use the most advanced functionalities• The WMProxy implements several functionalities, among which: – submission of job collections; – faster authentication; – faster match-making; – faster response time for users; – higher job throughput.
    10. 10. Proxy DelegationTo explicitly delegate a user proxy to WMProxy, thecommand to use is:glite-wms-job-delegate-proxy -d <delegID>Example:$ glite-wms-job-delegate-proxy -d mydelegIDConnecting to the service glite-wms-job-delegate-proxy Success ========Your proxy has been successfully delegated to the WMProxy: the delegation identifier: mydelegID=====================================================
    11. 11. Job SubmissionStarting from a simple JDL file, we can submit it via WMProxy by doing:$ glite-wms-job-submit –d mydelegID test.jdlConnecting to the service glite-wms-job-submit Success ========The job has been successfully submitted to the WMProxyYour job identifier is:
    12. 12. Listing CE(s) that matching a jobIt is possible to see which CEs are eligible to run a job described by a given JDL using:$ glite-wms-job-list-match –d mydelegID test.jdlConnecting to the service ELEMENT IDs LISTThe following CE(s) matching your job requirements have been found:*CEId*-
    13. 13. Retrieving the status of a job$ glite-wms-job-status INFORMATION:Status info for the Job : Status: Done (Success)Exit code: 0Status Reason: Job terminated successfullyDestination: Mon Dec 4 15:05:43 2006 CET=====================================================The verbosity level controls the amount of information provided. The value of the -v option ranges from 0 to 3.The commands to get the job status can have several jobIDs as arguments, i.e.: glite-wms-job-status <jobID1> ... or, more conveniently, the -i <file path> option can be used to
    14. 14. Retrieving the output(s)$ glite-wms-job-output to the servicehttps:// GET OUTPUT OUTCOMEOutput sandbox files for the job: been successfully retrieved and stored in the directory:/tmp/doe_yabp72aERhofLA6W2-LrJw=====================================================The default location for storing the outputs (normally/tmp) is defined in the UI configuration, but it is possibleto specify in which directory to save the output using the--dir <path name> option.
    15. 15. Cancelling a job$ glite-wms-job-cancel you sure you want to remove specified job(s) [y/n]y : yConnecting to the servicehttps:// glite-wms-job-cancel Success ============The cancellation request has been successfully submitted for the following job(s):- the cancellation is successful, the job will terminate in status CANCELLED
    16. 16. Job Submission with CLI GILDA User Interface glite-wms-job-delegate-proxy -d delegID glite-wms-job-list-match –d delegID hostname.jdl delegID glite-wms-job-submit -d delegID hostname.jdl  JobID glite-wms-job-status JobID glite-wms-job-output JobID Manage jobvoms-proxy-init --voms gilda Grid Site Computing Element Storage Element VO Management process Service(DB of VO users)
    17. 17. Possible Job states
    18. 18. Job Description Language• The Job Description Language (JDL) is a high-level language based on the Classified Advertisement (ClassAd) language, used to describe jobs and aggregates of jobs with arbitrary dependency relations. – The JDL is used in WLCG/EGEE to specify the desired job characteristics and constraints, which are taken into account by the WMS to select the best resource to execute the job. – A job description is a file (called JDL file) consisting of lines having the format: attribute = expression; – Expressions can span several lines, but only the last one must be terminated by a semicolon.
    19. 19. Job Description Language• The character “ ‘ ” cannot be used in the JDL.• Comments must be preceded by a sharp character (#) or a double slash (//) at the beginning if each line.• Multi-line comments must be enclosed between “/ *” and “*/” .Attention! The JDL is sensitive to blank characters and tabs. No blank characters or tabs should follow the semicolon at the end of a line.
    20. 20. Simple JDL example Executable = "/bin/hostname"; StdOutput = "std.out"; StdError = "std.err";The Executable attribute specifies the command to berun by the job. If the command is already present onthe WN, it must be expressed as a absolute path; if ithas to be copied from the UI, only the file name mustbe specified, and the path of the command on the UIshould be given in the InputSandbox attribute. Executable = ""; InputSandbox = {"/home/larocca/"}; StdOutput = "std.out"; StdError = "std.err";
    21. 21. • The Arguments attribute can contain a string value, which is taken as argument list for the executable: Arguments = "fileA 10";• In the Executable and in the Arguments attributes it may be necessary to use special characters, such as &, , |, >, <. These characters should be preceded by triple in the JDL, or specified inside quoted strings e.g.: Arguments = "-f file1&file2";• The shell environment of the job can be modified using the Environment attribute. Environment = {"CMS_PATH=$HOME/cms"};
    22. 22. • If files have to be copied from the UI to the execution node, they must be listed in the InputSandbox attribute: InputSandbox = {"", ... ,"fileN"};• The files to be transferred back to the UI after the job is finished can be specified using the OutputSandbox attribute: OutputSandbox = {"std.out","std.err"};• Wildcards are allowed only in the InputSandbox attribute.• Absolute paths cannot be specified in the OutputSandbox attribute.• The InputSandbox cannot contain two files with the same name, even if they have a different absolute path, as when transferred they would overwrite each other.
    23. 23. • The Requirements attribute can be used to express constraints on the resources where the job should run. – Its value is a Boolean expression that must evaluate to true for a job to run on that specific CE.• Note: Only one Requirements attribute can be specified (if there are more than one, only the last one is considered). If several conditions must be applied to the job, then they all must be combined in a single Requirements attribute.• For example, let us suppose that the user wants to run on a CE using PBS as batch system, and whose WNs have at least two CPUs. He will write then in the job description file:Requirements = other.GlueCEInfoLRMSType == "PBS" && other.GlueCEInfoTotalCPUs > 1;
    24. 24. • The WMS can be also asked to send a job to a particular queue in a CE with the following expression: Requirements = other.GlueCEUniqueID == "";• It is also possible to use regular expressions when expressing a requirement. – Let us suppose for example that the user wants all his jobs to run on any CE in the domain This can be achieved putting in the JDL file the following expression: Requirements = RegExp("",other.GlueCEUniqueID); – The opposite can be required by using: Requirements = (!RegExp("", other.GlueCEUniqueID));
    25. 25. • If the job must run on a CE where a particular experiment software is installed and this information is published by the CE, something like the following must be written:Requirements = Member(“BLAST-1.0.3”,other.GlueHostApplicationSoftwareRunTimeEnvironment);Note: The Member operator is used to test if its first argument (a scalar value) is a member of its second argument (a list). In fact, the GlueHostApplicationSoftwareRunTimeEnvironment attribute is a list of strings and is used to publish any VO- specific information relative to the CE (typically, information on the VO software available on that CE).
    26. 26. Advanced job types• Job Collection: a set of independent jobs that user can submit and monitor as it was a single job[ Type = “Collection"; nodes={ [ Executable = "/bin/hostname"; Arguments = “-f"; StdOutput = "hostname.out"; StdError = "hostname.err"; OutputSandbox = {"hostname.err","hostname.out"}; ],[ Executable = "/bin/sh"; Arguments = ""; StdOutput = “povray.out"; StdError = “povray.err"; InputSandbox = {“"}; OutputSandbox = {“povray.err",“povray.out"}; Requirements = Member (“POVRAY-3,5”, other.GlueHostApplicationSoftwareRunTimeEnvironment); ] };]
    27. 27. Advanced job types• Parametric Job: a job collection where the jobs are identical but for the value of a running parameter JobType = "Parametric"; Executable = “/bin/echo"; Arguments = “_PARAM_”; StdOutput = "myoutput_PARAM_.txt"; StdError = "myerror_PARAM_.txt"; Parameters = 3; ParameterStep = 1; ParameterStart = 1; OutputSandbox = {“myoutput_PARAM_.txt”};
    28. 28. Advanced job types• DAG is a set of jobs where the input, output, or execution of one or more jobs depends on one or more other ones • The jobs are nodes (vertices) in the graphType = "dag"; • the edges (arcs) identify the dependenciesmax_nodes_running = 5;InputSandbox = {"/tmp/foo/*.exe", "/home/larocca/bar", "gsi ", "file:///tmp/myconf"};InputSandboxBaseURI = "gsi";nodes = [ nodeA = [ description = [ JobType = "Normal"; Executable = "a.exe"; InputSandbox = { "/home/larocca/myfile.txt", root.InputSandbox}; ]; ]; nodeF = [ description = [ JobType = "Normal"; Executable = "b.exe"; Arguments = "1 2 3"; nodeA OutputSandbox = {"myoutput.txt", "myerror.txt" }; ]; ]; nodeD = [ description = [ JobType = "Checkpointable"; Executable = "b.exe"; Arguments = "1 2 3"; nodeB nodeC NodeF InputSandbox = { "file:///home/larocca/data.txt", root.nodes.nodeF.description.OutputSandbox[0] }; ]; ]; nodeC = [ file = "/home/larocca/nodec.jdl"; ]; nodeB = [ file = "foo.jdl"; ];]; nodeDdependencies = { { nodeA, nodeB }, { nodeA, nodeC }, {nodeA, nodeF }, { { nodeB, nodeC, nodeF }, nodeD } };
    29. 29. ReferencesWMProxy User’s guide https :// Attributes Specification 3.1 user’s guide jobs API usage https ://
    30. 30. The gLite stack: overview
    31. 31. Storage Elements• The Storage Element is the service which allows a user or an application to store/retrieve data for future retrieval.• The DMS provides services to locate, access and transfer files – User does not need to know the physical location of file, just its logical file name; – Files can be replicated or transferred to several locations (SEs) as needed; – Files are shared with all the members of the given VO.• Files stored in a SE are written-once, read-many – Files cannot be changed unless remove or replaced;
    32. 32. Protocols– The GSIFTP protocol offers the functionalities of FTP, but with support for GSI. It is responsible for secure, fast and efficient file transfers to/from Storage Elements.– RFIO was developed to access tape archiving systems, such as CASTOR (CERN Advanced STORage manager) and it comes in a secure and an insecure version.– The gsidcap protocol is the GSI enabled version of the dCache native access protocol, dcap.
    33. 33. Types of Storage Elements /1• In WLCG/EGEE, different types of Storage Elements are available:• CASTOR. It consists in a disk buffer frontend to a tape mass storage system. A virtual file system (namespace) shields the user from the complexities of the disk and tape underlying setup. File migration between disk and tape is managed by a process called “stager”. The native storage protocol, the insecure RFIO, allows access of files in the SE. Since the protocol is not GSI- enabled, only RFIO access from a location in the same LAN of the SE is allowed. With the proper modifications, the CASTOR disk buffer can be used also as disk-only storage system.
    34. 34. Types of Storage Elements /2• StoRM. It has been designed to support space reservation and direct access (native POSIX I/O call), as well as other standard libraries (like RFIO).• StoRM takes advantage from high performance parallel file systems like GPFS (from IBM). – In addition, standard POSIX file systems are supported (XFS from SGI and ext3).• StoRM takes advantage of ACL support provided by the underlying file systems to implement the security models
    35. 35. Types of Storage Elements /3• dCache. It consists of a server and one or more pool nodes. The server represents the single point of access to the SE and presents files in the pool disks under a single virtual file system tree. Nodes can be dynamically added to the pool. The native gsidcap protocol allows POSIX-like data access. dCache is widely employed as disk buffer frontend to many mass storage systems, like HPSS and Enstore, as well as a disk-only storage system.• LCG Disk pool manager. It’s a lightweight disk pool manager, suitable for relatively small sites (max 10 TB of total space). Disks can be added dynamically to the pool at any time. Like in dCache and CASTOR, a virtual file system hides the complexity of the disk pool architecture. The secure RFIO protocol allows file access from the WAN.
    36. 36. The Storage Resource Manager SRM
    37. 37. The Storage Resource ManagerThe Storage Resource Manager (SRM) has been designed to be the single interface for the management of disk and tape storage resources.Any type of Storage Element in WLCG/EGEE offers an SRM interface except for the Classic SE, which is being phased out.SRM hides the complexity of the resources setup behind it and allows the user to request files, keep them on a disk buffer for a specified lifetime, reserve space for new entries, and so on. – In gLite, interactions with the SRM is hidden by high level services (DM tools and APIs)
    38. 38. The gLite Storage Element
    39. 39. Grid file referencing schemes LFN GUID SURL TURL• Logical File Name (LFN) – lfn:/grid/gilda/tutorials/input-file• Grid Unique IDentifier (GUID) – guid:4d57edef-fa5c-4512-a345-1c838916b357• Storage URL (for a specific replica, on a specific Storage Element) – srm:// b366f371-b2c0-485d-b12c-c114edaf4db4 – sfn://• Transport URL (for a specific replica, on an SE, with a specific protocol) – gsi eb366f371-b2c0-485d-b12c-c114edaf4db4
    40. 40. LCG File CatalogSymlink Replica CatalogSymlink SURL LFN GUIDSymlink SURLSymlink SRM Interface TURL various protocols: gsiftp, gsidcap, rfio
    41. 41. Needles in a haystack• How do I keep track of all files I have on Grid ?• How does the Grid keep track of the mapping between LFN(s), GUID and SURL(s) ? LFC File Catalogue LFC = LCG File Catalogue LCG = LHC Compute Grid LHC = Large Hadron Collider• The LCG File Catalogue is the service which maintains mappings between LFN(s), GUID and SURL(s).
    42. 42. LFC File Catalogue• It consists of a unique catalogue, where the LFN is the main key. Further LFNs can be added as symlinks to the main LFN. – Looks like a “top-level” directory in the Grid – For each of the supported VO a separate subdirectory does exist under “/grid” directory – All the members of the VO have read/write permissions – System metadata are supported, while for user metadata only a single string entry is available• The catalogue publishes its endpoint in the Information Service so that it can be discovered by Data Management tools and other services (the WMS for example).
    43. 43. Architecture of the LFC Catalogue• LFN acts as main key in the database. It has: – Symbolic links to it (additional LFNs) – System metadata – Information on replicas – One field of user metadata – Access Control Lists – Integration with VOMS (VirtualID and VirtualGID) – C API language
    44. 44. Before to start..• User can interact with the file catalogue through CLIs and APIs. – The environment variable LFC_HOST (e.g.: must contains the host name of the LFC server to be used.• The directory structure of the LFC namespace has the form: /grid/<VO>/<subpaths> – Users of a given VO will have read and write permissions only under the corresponding <VO> subdirectory.
    45. 45. LFC Commandslfc-chmod Change access mode of the LFC file/directorylfc-chown Change owner and group of the LFC file/directorylfc-delcomment Delete the comment associated with the file/directorylfc-getacl Get file/directory access control listslfc-ln Make a symbolic link to a file/directorylfc-ls List file/directory entries in a directorylfc-mkdir Create a directorylfc-rename Rename a file/directorylfc-rm Remove a file/directorylfc-setacl Set file/directory access control listslfc-setcomment Add/replace a comment
    46. 46. lfc-ls• Listing the entries of a LFC directory – lfc-ls [-cdiLlRTu] [--class] [--comment] [--deleted] [--display_side] [-- ds] path… – where path specifies the LFN pathname (mandatory) – Remember that LFC has a directory tree structure – /grid/<VO_name>/<you create it> LFC Namespace Defined by the user – All members of a VO have read-write permissions under their directory – You can set LFC_HOME to use relative paths lfc-ls /grid/gilda/tutorials/taipei02 export LFC_HOME=/grid/gilda/tutorials lfc-ls -l taipei02 lfc-ls -l -R /grid
    47. 47. lfc-mkdir• Creating directories in the LFC – lfc-mkdir [-m mode] [-p] path...• Where path specifies the LFC pathname• Remember that while registering a new file (using lcg- cr, for example) the corresponding destination directory must be created in the catalog beforehand.• Examples: lfc-mkdir /grid/gilda/<YOUR_DIRECTORY> Created by the user
    48. 48. lfc-ln• Creating a symbolic link – lfc-ln -s file linkname – lfc-ln -s directory linkname – Create a link to the specified file or directory with linkname Examples: – lfc-ln -s /grid/gilda/test /grid/gilda/aLink Original File Symbolic Link Let’s check the link using lfc-ls with long listing – lfc-ls -l aLink lrwxrwxrwx 1 19122 1077 0 Jun 14 11:58 aLink - > /grid/gilda/test
    49. 49. Access Control List (ACL)• LFC allows to attach to a file or directory an access control list (ACL), a list of permissions which specify who is allowed to access or modify it. The permissions are very much like those of a UNIX file system: read (r), write (w) and execute (x).• In LFC, users and groups are internally identified as numerical virtual uids and virtual gids, which are virtual in the sense that they exist only in the LFC namespace. – A user can be specified as a name, as a virtual uid or as a DN. – A group can be specified as name, as a virtual gid or as a VOMS FQAN.• A directory in LFC has also a default ACL (which is the ACL associated to any file or directory being created under that directory). After creation, the ACLs can be freely changed. – When creating a sub-directory, its default ACL is inherited from the parent directory
    50. 50. Print the ACL of a directory$ lfc-getacl /grid/gilda/tutorials/test-acl # file: /grid/gilda/tutorials/test-acl # owner: /C=IT/O=INFN/OU=Personal Certificate/L=Catania/CN=Giuseppe La Rocca/Email= # group: gilda user::rwx group::rwx #effective:rwx other::r-x default:user::rwx default:group::rwx default:other::r-xIn this example, the owner and all users in the gilda group have full privileges to the directory, while other users cannot write into it.
    51. 51. Modify the ACL lfc-setacl [-d] [-m] [-s] acl_entries pathThe -m option means that we are modifying the existing ACL. Other options of lfc-setacl are -d to remove ACL entries, and -s to replace the complete set of ACL entries.acl_entries is a coma separated list of entries. Each entry has colon separated fields: ACL type, id (uid or gid), permission. Only directories can have default ACL entries!The entries look like: user::perm defaul::user:perm user:uid:perm defaul::user:uid:perm group:perm defaul::group:perm group:gid:perm defaul::group:gid:perm mask:perm default::mask:perm other:perm deafult::other:perm
    52. 52. Modify the ACL of a directoryLets change default ACL, with read/write permission for user and group, and no privileges for others. – The syntax we apply here is modify (-m) default (d:) for user (u:), and the same of course for group and others. $ lfc-setacl -m d::u:6,d::g:6,d::o:0 $LFC_HOME/test-acl/
    53. 53. Adding metadata informationThe lfc-setcomment and lfc-delcomment commands allow the user to associate a comment with a catalogue entry and delete such comment. This is the only user-defined metadata that can be associated with catalogue entries.The comments for the files may be listed using the --comment option of the lfc-ls command. This is shown in the following example:$ lfc-setcomment /grid/gilda/file1 “My metadata“$ lfc-ls --comment /grid/gilda/file1 /grid/gilda/file1 My metadata
    54. 54. LCG Data Management Client Tools• The LCG Data Management tools allow users to copy files between UI, WN and a SE, to register entries in the file catalogue and replicate files between SEs. lcg-cp Copies a Grid file to a local destination lcg-cr Copies a file to a SE and registers it in the catalogue lcg-del Deletes one file (either one replica or all the replicas) lcg-rep Copies a file from one SE to another SE and registers it in the catalogue lcg-gt Gets the TURL for a given SURL and transfer protocol lcg-aa Adds an alias in the catalogue for a given GUID lcg-ra Removes an alias in the catalogue for a given GUID lcg-rf Registers in the catalogue a file residing on a SE lcg-uf Unregisters in the catalogue a file residing on a SE lcg-la Lists the aliases for a given LFN, GUID or SURL lcg-lg Gets the GUID for a given LFN or SURL lcg-lr Lists the replicas for a given LFN, GUID or SURL
    55. 55. Environment variables /1• The --vo <vo name> option, to specify the virtual organisation of the user, is present in all commands, except for lcg-gt. Its usage is mandatory unless the variable LCG_GFAL_VO is set (e.g.: export LCG_GFAL_VO=gilda)Timeouts The commands lcg-cr, lcg-del, lcg-gt, lcg-rf, lcg-sd and lcg-rep all have timeouts implemented. By using the option -t, the user can specify a number of seconds for the timeout. The default is 0 seconds, that is no timeout. If we got a times out during the performing of an operation, all actions performed till that moment are rolled back, so no broken files are left on a SE and no existing files are not registered in the catalogues.
    56. 56. Environment variables /2• For all lcg-* commands to work, the environment variable LCG_GFAL_INFOSYS must be set to point to a top BDII in the format <hostname>:<port>, so that the commands can retrieve the necessary information export• The VO_<VO>_DEFAULT_SE variable specifies the default SE for the VO. export
    57. 57. Uploading a file to the Grid /1$ lcg-cr --vo gilda -d file:/home/larocca/file1 guid:6ac491ea-684c-11d8-8f12-9c97cebf582a where the only argument is the local file to be uploaded and the -d <destination> option indicates the SE used as the destination for the file. The command returns the file GUID. If no destination is given, the SE specified by the VO_<VO>_DEFAULT_SE environmental variable is taken. The -P option allows the user to specify a relative path name for the file in the SE. If no -P option is given, the relative path is automatically generated.
    58. 58. Uploading a file to the Grid /2The following are examples of the different ways to specify a destination: -d -d srm:// -d -P my_dir/my_fileThe –l <lfn> option can be used to specify a LFN:$ lcg-cr --vo gilda -d -l lfn:/grid/gilda/myalias1 file:/home/larocca/file1 guid:db7ddbc5-613e-423f-9501-3c0c00a0ae24
    59. 59. Replicating a file$ lcg-rep -v --vo gilda -d <SECOND_SE> guid:db7ddbc5-613e-423f-9501-3c0c00a0ae24Source URL:sfn:// size: 30Destination specified: <SECOND_SE>Source URL for copy:gsi URL for copy:gsiftp://<SECOND_SE>/data/gilda/generated/2004-07-09/ file50c0752c-f61f-4bc3-b48e-af3f22924b57# streams: 1Transfer took 2040 msDestination URL registered in LRC: srm://<SECOND_SE>/data/gilda/generated/2004-07-09/fi le50c0752c-f61f-4bc3-b48e-af3f22924b57
    60. 60. Listing replicas$ lcg-lr --vo gilda lfn:/grid/gilda/tutorials/larocca/my_alias1 srm:// -09/file79aee616-6cd7-4b75-8848-f091 srm://<SECOND_SE>/data/gilda/generated/2004-07-08/file 0dcabb46-2214-4db8-9ee8-2930Again, a LFN, the GUID or a SURL can be used to specify the file.
    61. 61. Copying files out the Grid$ lcg-cp --vo gilda -t 100 -v lfn:/grid/gilda/tutorials/mytext.txt file:/tmp/mytext.txtSource URL: lfn:/grid/gilda/mytext.txtFile size: 104857600Source URL for copy:gsi input2.dat.10.0Destination URL: file:///tmp/myfile# streams: 1# set timeout to 100 (seconds) 85983232 bytes 8396.77 KB/sec avg 9216.11Transfer took 12040 ms
    62. 62. Deleting replicas /1A file stored on a SE and registered in LFC can be deleted using the lcg-del command.• If a SURL is provided as argument, then that particular replica will be deleted.• If a LFN or GUID is given instead then the –s <SE> option must be used to indicate which one of the replicas must be erased$ lcg-del --vo gilda -s guid:91b89dfe-ff95-4614-bad2-c538bfa28fac
    63. 63. Deleting replicas /2• If the –a option is used, all the replicas of the given file will be deleted and unregistered from the catalog.$ lcg-del --vo gilda -a guid:91b89dfe-ff95-4614-bad2-c538bfa28fac
    64. 64. Registering Grid filesThe lcg-rf command allows to register a file physically present in a SE, creating a GUID-SURL mapping in the catalogue.The -g <GUID> option allows to specify a GUID (otherwise automatically created).$ lcg-rf --vo gilda -l lfn:/grid/gilda/newfile srm:// 7 08/file0dcabb46-2214-4db8-9ee8-2930de1 guid:baddb707-0cb5-4d9a-8141-a046659d243b
    65. 65. Unregistering Grid fileslcg-uf allows to delete a GUID-SURL mapping (respectively the first and second argument of the command) from the catalogue:$ lcg-uf --vo gilda guid:baddb707-0cb5-4d9a-8141-a046659d243b srm:// 07 08/file0dcabb46-2214-4db8-9ee8-2930de1If the last replica of a file is unregistered, the corresponding GUID-LFN mapping is also removed. Attention! lcg-uf just removes entries from the catalogue.
    66. 66. Working with large data datasets• The InputSandbox and OutputSandbox attributes are the basic way to move files to and from the User Interface (UI) and the Worker Node (WN).• However, there are other ways to move files to and from the WN especially when large files (> 10 MB) are involved
    67. 67. “User Input “sandbox” DataSets infointerface” Output “sandbox” WMS LCG File In pu Catalogue (LFC) t“ san Ou db tp ut o x” “sa +B n db ro ox erk ” In fo Storage Computing Element 2 Element
    68. 68. References• gLite 3 User Guide – Manual Series – https ://• gLite Documentation homepage –• DM subsystem documentation –• LFC and DPM documentation –• DM API – /Data_Management_Java_API
    69. 69. Running more realistic jobs with the GENIUS Grid portal:Porting “BLAST” & “MrBayes” applications to Grid Case study from CNR - ITB
    70. 70. The GENIUS Grid Portal architecture• The GENIUS Grid portal (license ver 4.2 is free for educational)is built on top of the EnginFrame Java/XML framework;• It’s a gateway to European EGEE Project middleware (it’seasily customizable for other middleware);• It allows to expose gLite-enabled applications via web browseras well as Web Services.
    71. 71. What is EnginFrame ?• It is a web-based technology able to expose Grid services running on Grid infrastructures• It allows organizations to provide application-oriented computing and data services to both users (via Web browsers) and applications (via SOAP/WSDL and/or RSS)• It’s a Grid gateway!!• It greatly simplifies the development of Web Portals exposing computing services that can run on a broad range of different computational Grid systems
    72. 72. About MrBayes• MrBayes is a program for the Bayesian estimation of phylogeny.• Bayesian inference of phylogeny is based on the posterior probability distribution of trees, which is the probability of a tree conditioned on the observations. – To approximate the posterior probability distribution of trees MrBayes uses a simulation technique called Markov Chain Monte Carlo (or MCMC).• The program takes as input a character matrix in a NEXUS file format.• The output is several files with the parameters that were sampled by the MCMC algorithm.• The application is CPU demanding, especially if the MPI version of the software is used.
    73. 73. EnginFrame & MrBayes
    74. 74. The Users Tracking System (UTS) /1
    75. 75. The Users Tracking System (UTS) /2
    76. 76. About BLASTBLAST (Basic Local Alignment Search Tool) provides a method for rapid searching of nucleotide and protein databases.The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. Click here to download results
    77. 77. Thank you for your attention!