Enabling Grids for E-sciencE



                  Distribute Data and Metadata
                           Management
     ...
Outline
                  Enabling Grids for E-sciencE




   • Grid Data Management Challenge

   • Storage Elements and ...
The Grid DM Challenge
                    Enabling Grids for E-sciencE




 • Heterogeneity
                              ...
SRM in an example
                     Enabling Grids for E-sciencE




                  She is running a job which needs...
SRM in an example
                  Enabling Grids for E-sciencE




   dCache
   Own system, own protocols
   and paramet...
SRM in an example
                  Enabling Grids for E-sciencE




   dCache
   Own system, own protocols
   and paramet...
Storage Resource Managers
                      Enabling Grids for E-sciencE




   • Data are stored on disk pool servers...
Data management with gLite
                      Enabling Grids for E-sciencE


    •   Assumptions:
         – Users and ...
gLite Grid Storage Requirements
                      Enabling Grids for E-sciencE




•   The Storage Element is the serv...
gLite Storage Element
                  Enabling Grids for E-sciencE




                            (http/https)




INFS...
gLite SE types history
                        Enabling Grids for E-sciencE



     • gLite data access protocols:
       ...
gLite SE types history(II)
                   Enabling Grids for E-sciencE




 • Mass Storage Systems (Castor)
      – Fi...
DPM Overview
                          Enabling Grids for E-sciencE


   • The Disk Pool Manager (DPM) is a lightweight so...
DPM Architecture
                          Enabling Grids for E-sciencE




                                              ...
DPM architecture (details)
                          Enabling Grids for E-sciencE




EGEE-II INFSO-RI-031688             ...
DPM architecture (details)
                          Enabling Grids for E-sciencE




      CLI, C API,
     SRM-enabled
 ...
DPM architecture (details)
                           Enabling Grids for E-sciencE




      CLI, C API,
     SRM-enabled
...
DPM architecture (details)
                           Enabling Grids for E-sciencE




      CLI, C API,
     SRM-enabled
...
DPM architecture (details)
                           Enabling Grids for E-sciencE




                                   ...
DPM architecture (details)
                           Enabling Grids for E-sciencE




                                   ...
DPM architecture (details)
                           Enabling Grids for E-sciencE




                                   ...
DPM strengths
                          Enabling Grids for E-sciencE




      • Easy to install/configure
            - F...
Files Naming conventions
                       Enabling Grids for E-sciencE


   • Logical File Name (LFN)
        – An a...
What is a file catalog
                                    File Catalog
                    Enabling Grids for E-sciencE

...
Upload and file convention example
                       Enabling Grids for E-sciencE

 [issgc59@issgc-ui ~]$ lcg-cr -v -...
The LFC (LCG File Catalog)
                        Enabling Grids for E-sciencE




•   It keeps track of the location of ...
LFC Features
                   Enabling Grids for E-sciencE



         – Cursors for large queries
         – Timeouts a...
lfc-ls
                         Enabling Grids for E-sciencE



 Listing the entries of a LFC directory
      lfc-ls  [-cd...
lfc-mkdir
                     Enabling Grids for E-sciencE




   Creating directories in the LFC
        lfc-mkdir [-m m...
lfc-ln
                      Enabling Grids for E-sciencE


         Creating a symbolic link
         lfc-ln -s file link...
LFC commands
                      Enabling Grids for E-sciencE


     Summary of the LFC Catalog commands
     lfc-chmod ...
LFC C API
                      Enabling Grids for E-sciencE


    Low level methods (many POSIX-like):
                  ...
GFAL: Grid File Access Library
                        Enabling Grids for E-sciencE


•   Interactions with SE require som...
GFAL: File I/O API (I)
                   Enabling Grids for E-sciencE




    int gfal_access (const char *path, int amod...
GFAL: FC API (II)
                   Enabling Grids for E-sciencE



   int gfal_closedir (DIR *dirp);
   int gfal_mkdir (...
GFAL: Catalog API
                     Enabling Grids for E-sciencE


        int create_alias (const char *guid, const ch...
GFAL: SRM API
                     Enabling Grids for E-sciencE




       int deletesurl (const char *surl)
       int ge...
GFAL Java API
                   Enabling Grids for E-sciencE




      •   GFAL API are available for C/C++ programmers
 ...
lcg_utils DM tools
                    Enabling Grids for E-sciencE



    • High level interface (CL tools and APIs) to
 ...
lcg-utils commands
                         Enabling Grids for E-sciencE

     Replica Management
     lcg-cp          Cop...
Enabling Grids for E-sciencE
                                                                     LFC interfaces
         ...
Data movement introduction
                     Enabling Grids for E-sciencE




    • Grids are naturally distributed sys...
Direct Client Controlled
                      Enabling Grids for E-sciencE
                                              ...
Transfer Service
                      Enabling Grids for E-sciencE




    • Clear need for a service
      for data tran...
gLite FTS: Channels
                    Enabling Grids for E-sciencE




   •   FTS Service has a concept of
       channe...
Why Grid needs Metadata?
                    Enabling Grids for E-sciencE




   • Grids allow to save millions of files s...
Basic Metadata Concept
                    Enabling Grids for E-sciencE




     • Entries – Representation of real world ...
Example: Movie Trailers
                          Enabling Grids for E-sciencE




      • Movie trailers files (entries) ...
Trailer example
                          Enabling Grids for E-sciencE




 Entry names                              Title...
Trailer example
                          Enabling Grids for E-sciencE




                                               ...
Trailer example
                          Enabling Grids for E-sciencE




                           Schema              ...
Trailer example
                          Enabling Grids for E-sciencE




                           Schema              ...
Trailer example
                          Enabling Grids for E-sciencE




                           Schema              ...
Metadata service on the Grid
                  Enabling Grids for E-sciencE




INFSO-RI-508833                           ...
Metadata service on the Grid
                  Enabling Grids for E-sciencE



   • Information about files -- but not onl...
Metadata service on the Grid
                   Enabling Grids for E-sciencE



   • Information about files -- but not on...
Metadata service on the Grid
                   Enabling Grids for E-sciencE



   • Information about files -- but not on...
Metadata service on the Grid
                   Enabling Grids for E-sciencE



   • Information about files -- but not on...
Metadata service on the Grid
                   Enabling Grids for E-sciencE



   • Information about files -- but not on...
Metadata service on the Grid
                   Enabling Grids for E-sciencE



   • Information about files -- but not on...
Monitoring of running application
                        Enabling Grids for E-sciencE




                   SE          ...
Inputset for parametric jobs
                             Enabling Grids for E-sciencE




   • /grid/my_simulation/input
...
A possible parameter-get.sh script
                      Enabling Grids for E-sciencE

   #!/bin/bash
   # Find the first ...
Use a Metadata services to exchange data
                  Enabling Grids for E-sciencE
                                  ...
Use a Metadata services to exchange data
                   Enabling Grids for E-sciencE
                                 ...
Use a Metadata services to exchange data
                   Enabling Grids for E-sciencE
                                 ...
Information exchanging among grid peers
                        Enabling Grids for E-sciencE




                         ...
The AMGA Metadata Catalogue
                    Enabling Grids for E-sciencE




    • Official metadata service for the g...
AMGA Features
                  Enabling Grids for E-sciencE




INFSO-RI-508833                                  ISSGC 09...
AMGA Features
                    Enabling Grids for E-sciencE




 • Dynamic Schemas
      – Schemas can be modified at r...
AMGA Features
                    Enabling Grids for E-sciencE




 • Dynamic Schemas
      – Schemas can be modified at r...
AMGA Features
                      Enabling Grids for E-sciencE




 • Dynamic Schemas
      – Schemas can be modified at...
AMGA Features
                      Enabling Grids for E-sciencE




 • Dynamic Schemas
      – Schemas can be modified at...
Example
                  Enabling Grids for E-sciencE




INFSO-RI-508833                                  ISSGC 09, Soph...
AMGA Security
                  Enabling Grids for E-sciencE




INFSO-RI-508833                                  ISSGC 09...
AMGA Security
                  Enabling Grids for E-sciencE




  • Unix style permissions - users and groups




INFSO-R...
AMGA Security
                  Enabling Grids for E-sciencE




  • Unix style permissions - users and groups
  • ACLs – ...
AMGA Security
                  Enabling Grids for E-sciencE




  • Unix style permissions - users and groups
  • ACLs – ...
AMGA Security
                  Enabling Grids for E-sciencE




  •   Unix style permissions - users and groups
  •   ACL...
AMGA Security
                  Enabling Grids for E-sciencE




  •   Unix style permissions - users and groups
  •   ACL...
AMGA Implementation
                            Enabling Grids for E-sciencE




 • C++ multiprocess server
      – Backen...
AMGA Datatypes
                  Enabling Grids for E-sciencE




   - Using the above datatypes you are sure that your
  ...
Accessing AMGA from UI/WNs
                     Enabling Grids for E-sciencE



   • TCP Streaming Front-end
        – mdc...
Advanced features: Metadata Replication
                        Enabling Grids for E-sciencE




     • AMGA provides a re...
Metadata Replication: Use cases
                       Enabling Grids for E-sciencE




                  Full replication...
Existing SQL DBs importing
                   Enabling Grids for E-sciencE




    • Since AMGA 1.2.10, a new import featu...
Set up AMGA to access your tables
                        Enabling Grids for E-sciencE




     • To remember: AMGA stores...
Set up AMGA to access your tables
                     Enabling Grids for E-sciencE




      • Properly set up authorizat...
Native SQL Support
                      Enabling Grids for E-sciencE



   • Objective:
        – implement native SQL qu...
Enabling Grids for E-sciencE
                                                     Native SQL example
    Query> INSERT INT...
Biomed - MDM
                      Enabling Grids for E-sciencE




•   Medical Data Manager – MDM
    – Store and access ...
gMOD: grid Movie On Demand
                  Enabling Grids for E-sciencE




  • gMOD provides a Video-On-Demand service
...
gMOD under the hood
                  Enabling Grids for E-sciencE




     • Built on top of gLite services:
     • Stora...
gMOD interactions
                  Enabling Grids for E-sciencE




      VOMS                                           ...
gMOD screenshot
                      Enabling Grids for E-sciencE


        gMOD is accesible through the Genius Portal (...
gLibrary features
                  Enabling Grids for E-sciencE



   • INFN-developed tool totally gLite based
   • It a...
gLibrary as the iTunes for the Grid
                  Enabling Grids for E-sciencE




INFSO-RI-508833                    ...
Browse & Search
                     Enabling Grids for E-sciencE

   • Assets can be browsed selecting a type (or categor...
Organize assets
                    Enabling Grids for E-sciencE

 • “Types” and “Categories” definition by repository
   ...
Store & Retrieve
                  Enabling Grids for E-sciencE


•   Users can upload their local assets on one or more
 ...
gLibrary architecture
                  Enabling Grids for E-sciencE




INFSO-RI-508833                                  ...
Features
                          Enabling Grids for E-sciencE


     • Implemented as Web 2.0 application
          – AJ...
Technologies used
                  Enabling Grids for E-sciencE




INFSO-RI-508833                                   ISS...
Technologies used
                  Enabling Grids for E-sciencE



  • Web standards:
       – Javascript/AJAX/JSON on th...
Technologies used
                  Enabling Grids for E-sciencE



  • Web standards:
       – Javascript/AJAX/JSON on th...
Technologies used
                   Enabling Grids for E-sciencE



  • Web standards:
       – Javascript/AJAX/JSON on t...
De Roberto cultural heritage
                  Enabling Grids for E-sciencE



  • De Roberto, an Italian writer of the XI...
Enabling Grids for E-sciencE




INFSO-RI-508833                                  ISSGC’09, Taipei, 21st April 09   77
Digitalize to preserve them
                  Enabling Grids for E-sciencE




 • Some sheets are damaged (mold, crumbed
 ...
Enabling Grids for E-sciencE




INFSO-RI-508833                                  ISSGC’09, Taipei, 21st April 09   79
Acquisition stage
                  Enabling Grids for E-sciencE




INFSO-RI-508833                                      ...
Acquisition stage
                     Enabling Grids for E-sciencE


• Digitalization of manuscripts, typescripts, printe...
Acquisition stage
                      Enabling Grids for E-sciencE


• Digitalization of manuscripts, typescripts, print...
Goals and requirements
                  Enabling Grids for E-sciencE




INFSO-RI-508833                                 ...
Goals and requirements
                  Enabling Grids for E-sciencE


 • Make those works accessible to the humanistic c...
Goals and requirements
                    Enabling Grids for E-sciencE


 • Make those works accessible to the humanistic...
Goals and requirements
                    Enabling Grids for E-sciencE


 • Make those works accessible to the humanistic...
What Data Grids can offer to them
                  Enabling Grids for E-sciencE




INFSO-RI-508833                      ...
What Data Grids can offer to them
                  Enabling Grids for E-sciencE


   • store the 8000 scans of De Roberto...
What Data Grids can offer to them
                  Enabling Grids for E-sciencE


   • store the 8000 scans of De Roberto...
What Data Grids can offer to them
                  Enabling Grids for E-sciencE


   • store the 8000 scans of De Roberto...
What Data Grids can offer to them
                  Enabling Grids for E-sciencE


   • store the 8000 scans of De Roberto...
What Data Grids can offer to them
                  Enabling Grids for E-sciencE


   • store the 8000 scans of De Roberto...
What Data Grids can offer to them
                  Enabling Grids for E-sciencE


   • store the 8000 scans of De Roberto...
What Data Grids can offer to them
                  Enabling Grids for E-sciencE


   • store the 8000 scans of De Roberto...
Data Grids to preserve Cultural Heritage
                  Enabling Grids for E-sciencE




INFSO-RI-508833               ...
Live Demo
                  Enabling Grids for E-sciencE




INFSO-RI-508833                                  ISSGC’09, Ta...
Data Management Services Summary
                  Enabling Grids for E-sciencE



 • Storage Element – save date and prov...
References
                  Enabling Grids for E-sciencE




 •    gLite documentation homepage
      – http://glite.web....
References
                     Enabling Grids for E-sciencE


   • AMGA Web Site
             http://cern.ch/amga


   • ...
gLibrary references
                          Enabling Grids for E-sciencE




     •    gLibray project homepage:
       ...
Session 24 - Distribute Data and Metadata Management with gLite
Upcoming SlideShare
Loading in...5
×

Session 24 - Distribute Data and Metadata Management with gLite

6,603

Published on

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
6,603
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
44
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Session 24 - Distribute Data and Metadata Management with gLite

  1. 1. Enabling Grids for E-sciencE Distribute Data and Metadata Management with gLite Antonio Calanducci antonio.calanducci@ct.infn.it National Institute of Nuclear Physics INFN Catania EGEE NA3 Training & Induction ISSGC 2009 Sophia-Antipolis, Polytech’Nice-Sofia , 10/07/2009 www.eu-egee.org INFSO-RI-031688
  2. 2. Outline Enabling Grids for E-sciencE • Grid Data Management Challenge • Storage Elements and SRM • File Catalogs and Data Management tools • Metadata Service with use cases INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 2
  3. 3. The Grid DM Challenge Enabling Grids for E-sciencE • Heterogeneity – Need common interface – Data are stored on different to storage resources storage systems using different  Storage Resource access technologies Manager (SRM) • Distribution – Data are stored in different – Need to keep track where locations – in most cases there data is stored is no shared file system or  File and Replica Catalogs common namespace – Need scheduled, reliable – Data need to be moved between file transfer different locations  File transfer service • Data about data – Data stored can be huge: find – Need a way to describe files according to their contents files’ content and query their descriptions  Metadata service INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 3
  4. 4. SRM in an example Enabling Grids for E-sciencE She is running a job which needs: Data for physics event reconstruction Simulated Data Some data analysis files They are at CERN in dCache She will write files remotely too They are at Fermilab In a disk array They are at Nikhef in a classic SE INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 4
  5. 5. SRM in an example Enabling Grids for E-sciencE dCache Own system, own protocols and parameters You as a user need gLite DPM to know all Independent system from dCache or Castor the systems!!! Castor No connection with dCache or DPM INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 5
  6. 6. SRM in an example Enabling Grids for E-sciencE dCache Own system, own protocols and parameters I talk to them on your behalf You as a I will user needspace even allocate gLite DPM for your know all to files SRM Independent system from And I will use transfer dCache or Castor protocols the to send your files theresystems!!! Castor No connection with dCache or DPM INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 5
  7. 7. Storage Resource Managers Enabling Grids for E-sciencE • Data are stored on disk pool servers or Mass Storage Systems • Storage resource management needs to take into account – Transparent access to files (migration to/from disk pool) – File pinning – Space reservation – File status notification – Life time management • The SRM (Storage Resource Manager) takes care of all these details – The SRM is a single interface that takes care of local storage interaction and provides a Grid interface to the outside world INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 6
  8. 8. Data management with gLite Enabling Grids for E-sciencE • Assumptions: – Users and programs produce and require data – the lowest granularity of the data is on the file level (we deal with files rather than data objects or tables)  Data = files • Files: – Mostly, write once, read many – Located in Storage Elements (SEs) – Several replicas of one file in different sites – Accessible by Grid users and applications from “anywhere” – Locatable by the WMS (data requirements in JDL) • Also… – WMS can send (small amounts of) data to/from jobs: Input and Output Sandbox – Files may be copied from/to local filesystems (WNs, UIs) to the Grid (SEs) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 7
  9. 9. gLite Grid Storage Requirements Enabling Grids for E-sciencE • The Storage Element is the service which allow a user or an application to store data for future retrieval • Manage local storage (disks) and interface to Mass Storage Systems(tapes) like – HPSS, CASTOR, DiskeXtender (UNITREE), … • Be able to manage different storage systems uniformly and transparently for the user (providing an SRM interface) • Support basic file transfer protocols – GridFTP mandatory – Others if available (https, ftp, etc) • Support a native I/O (remote file) access protocol – POSIX (like) I/O client library for direct access of data (GFAL) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 8
  10. 10. gLite Storage Element Enabling Grids for E-sciencE (http/https) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 9
  11. 11. gLite SE types history Enabling Grids for E-sciencE • gLite data access protocols: – File Transfer: GSIFTP (GridFTP)/HTTP(S) – File I/O (Remote File access):  gsidcap  rfio (insecure RFIO)  gsirfio (secured RFIO) • Classic SE: – GridFTP server – Insecure RFIO daemon (rfiod) – only LAN limited file access – Single disk or disk array – No quota management – Does not support the SRM interface INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 10
  12. 12. gLite SE types history(II) Enabling Grids for E-sciencE • Mass Storage Systems (Castor) – Files migrated between front-end disk and back-end tape storage hierarchies – GridFTP server – Insecure RFIO (Castor) – Provide a SRM interface with all the benefits • Disk pool managers (dCache and gLite DPM) – manage distributed storage servers in a centralized way – Physical disks or arrays are combined into a common (virtual) file system – Disks can be dynamically added to the pool – GridFTP server – Secure remote access protocols (gsidcap for dCache, gsirfio for DPM) – SRM interface INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 11
  13. 13. DPM Overview Enabling Grids for E-sciencE • The Disk Pool Manager (DPM) is a lightweight solution for disk storage management, which offers the SRM interfaces (2.2 released in DPM version 1.6.3). • Each DPM–type Storage Element (SE), is composed by an head node and one (in the same machine) or more disk servers • The DPM handles the storage on Disk Servers. – It handles pools: a pool is a group of file systems, located on one or more disk servers. The DPM Disk Servers can have multiple filesystems in the pool. • Supported protocols: – File Transfers: GridFTP, HTTP/HTTPs – File I/O: GSIRFIO EGEE-II INFSO-RI-031688 EGEE Tutorial - Barcelona 14th - 18th April 2008, 12 3
  14. 14. DPM Architecture Enabling Grids for E-sciencE /dpm /domain /home CLI, C API, /vo (uid, gid1, …) SRM-enabled client, DPM etc. da head node file ta tra • DPM Name Server ns fe Namespace r Authorization Physical files location • Disk Servers Physical files • Direct data transfer from/to DPM disk server (no bottleneck) disk servers EGEE-II INFSO-RI-031688 EGEE Tutorial - Barcelona 14th - 18th April 2008, 13 5
  15. 15. DPM architecture (details) Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE Tutorial - Barcelona 14th - 18th April 2008, 14 6
  16. 16. DPM architecture (details) Enabling Grids for E-sciencE CLI, C API, SRM-enabled client, etc. EGEE-II INFSO-RI-031688 EGEE Tutorial - Barcelona 14th - 18th April 2008, 14 6
  17. 17. DPM architecture (details) Enabling Grids for E-sciencE CLI, C API, SRM-enabled client, etc. SRMv1 SRMv2 SRMv2.2 DPM /dpm head node /domain /home DPM DPNS /vo file daemon daemon EGEE-II INFSO-RI-031688 EGEE Tutorial - Barcelona 14th - 18th April 2008, 14 6
  18. 18. DPM architecture (details) Enabling Grids for E-sciencE CLI, C API, SRM-enabled client, etc. SRMv1 SRMv2 SRMv2.2 DPM /dpm head node /domain /home DPM DPNS /vo file daemon daemon EGEE-II INFSO-RI-031688 EGEE Tutorial - Barcelona 14th - 18th April 2008, 14 6
  19. 19. DPM architecture (details) Enabling Grids for E-sciencE DPM … disk servers CLI, C API, Secure RFIO Secure RFIO GridFTP GridFTP SRM-enabled client, etc. SRMv1 SRMv2 SRMv2.2 DPM /dpm head node /domain /home DPM DPNS /vo file daemon daemon EGEE-II INFSO-RI-031688 EGEE Tutorial - Barcelona 14th - 18th April 2008, 14 6
  20. 20. DPM architecture (details) Enabling Grids for E-sciencE data transfer DPM … disk servers CLI, C API, Secure RFIO Secure RFIO GridFTP GridFTP SRM-enabled client, etc. SRMv1 SRMv2 SRMv2.2 DPM /dpm head node /domain /home DPM DPNS /vo file daemon daemon EGEE-II INFSO-RI-031688 EGEE Tutorial - Barcelona 14th - 18th April 2008, 14 6
  21. 21. DPM architecture (details) Enabling Grids for E-sciencE data transfer DPM … disk servers CLI, C API, Secure RFIO Secure RFIO GridFTP GridFTP SRM-enabled client, etc. SRMv1 SRMv2 SRMv2.2 DPM /dpm head node /domain /home DPM DPNS /vo file daemon daemon DPNS database DPM database EGEE-II INFSO-RI-031688 EGEE Tutorial - Barcelona 14th - 18th April 2008, 14 6
  22. 22. DPM strengths Enabling Grids for E-sciencE • Easy to install/configure - Few configuration files • Manageable storage - Logical Namespace - Easy to add/remove file systems • Low maintenance effort • Supports as many disk servers as needed • Low memory footprint • Low CPU utilization EGEE-II INFSO-RI-031688 EGEE Tutorial - Barcelona 14th - 18th April 2008, 15 7
  23. 23. Files Naming conventions Enabling Grids for E-sciencE • Logical File Name (LFN) – An alias created by a user to refer to some item of data, e.g. “lfn:/grid/ gilda/20030203/run2/track1” • Globally Unique Identifier (GUID) – A non-easy-to-remember unique identifier for an item of data, e.g. “guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6” • Site URL (SURL) (or Physical File Name (PFN) or Site FN) – The location of an actual piece of data on a storage system e.g. “srm://grid009.ct.infn.it/dpm/ct.infn.it/gilda/output10_1” (SRM) • Transport URL (TURL) – Temporary locator of a replica + access protocol: understood by a SE – e.g. “rfio://lxshare0209.cern.ch//data/alice/ntuples.dat” INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 16
  24. 24. What is a file catalog File Catalog Enabling Grids for E-sciencE SE SE gLite SE UI INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 17
  25. 25. Upload and file convention example Enabling Grids for E-sciencE [issgc59@issgc-ui ~]$ lcg-cr -v -d aliserv6.ct.infn.it -l lfn:/grid/gilda/tutorials/ issgc09/message0.sh file:/home/issgc59/message0.sh Using grid catalog type: lfc Using grid catalog : lfc-gilda.ct.infn.it SE type: SRMv1 Destination SURL : srm://aliserv6.ct.infn.it/dpm/ct.infn.it/home/gilda/generated/ 2009-07-10/file273edfaa-e0ed-448c-b713-7996c2d09f88 Source SRM Request Token: 401296 Source URL: file:/home/issgc59/message0.sh File size: 20 VO name: gilda Destination specified: aliserv6.ct.infn.it Destination URL for copy: gsiftp://aliserv6.ct.infn.it/aliserv6.ct.infn.it:/data01/gilda/ 2009-07-10/file273edfaa-e0ed-448c-b713-7996c2d09f88.401296.0 # streams: 1 20 bytes 0.01 KB/sec avg 0.01 KB/sec inst Transfer took 3070 ms Using LFN: lfn:/grid/gilda/tutorials/issgc09/message0.sh Using GUID: guid:2eed7180-2505-46fb-bef2-19ef77f1b2fd Registering LFN: /grid/gilda/tutorials/issgc09/message0.sh (2eed7180-2505-46fb- bef2-19ef77f1b2fd) Registering SURL: srm://aliserv6.ct.infn.it/dpm/ct.infn.it/home/gilda/generated/ 2009-07-10/file273edfaa-e0ed-448c-b713-7996c2d09f88 (2eed7180-2505-46fb-bef2-19ef77f1b2fd) guid:2eed7180-2505-46fb-bef2-19ef77f1b2fd INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  26. 26. The LFC (LCG File Catalog) Enabling Grids for E-sciencE • It keeps track of the location of copies (replicas) of Grid files • LFN acts as main key in the database. It has: – Symbolic links to it (additional LFNs) – Unique Identifier (GUID) – System metadata – Information on replicas – One field of user metadata INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 19
  27. 27. LFC Features Enabling Grids for E-sciencE – Cursors for large queries – Timeouts and retries from the client – User exposed transactional API (+ auto rollback on failure) – Hierarchical namespace and namespace operations (for LFNs) – Integrated GSI Authentication + Authorization – Access Control Lists (Unix Permissions and POSIX ACLs) – Checksums – Integration with VOMS (VirtualID and VirtualGID) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 20
  28. 28. lfc-ls Enabling Grids for E-sciencE Listing the entries of a LFC directory lfc-ls [-cdiLlRTu] [--class] [--comment] [--deleted] [--display_side] [--ds] path… where path specifies the LFN pathname (mandatory) – Remember that LFC has a directory tree structure – /grid/<VO_name>/<you create it> LFC Namespace Defined by the user – All members of a VO have read-write permissions under their directory – You can set LFC_HOME to use relative paths > lfc-ls /grid/gilda/tony -l : long listing > export LFC_HOME=/grid/gilda -R : list the contents of directories > lfc-ls -l tony recursively: Don’t use it! > lfc-ls -l -R /grid INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 21
  29. 29. lfc-mkdir Enabling Grids for E-sciencE Creating directories in the LFC lfc-mkdir [-m mode] [-p] path... • Where path specifies the LFC pathname • Remember that while registering a new file (using lcg-cr, for example) the corresponding destination directory must be created in the catalog beforehand. • Examples: > lfc-mkdir /grid/gilda/tony/demo You can just check the directory with: > lfc-ls -l /grid/gilda/tony drwxr-xrwx 0 19122 1077 0 Jun 14 11:36 demo INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 22
  30. 30. lfc-ln Enabling Grids for E-sciencE Creating a symbolic link lfc-ln -s file linkname lfc-ln -s directory linkname Create a link to the specified file or directory with linkname – Examples: > lfc-ln -s /grid/gilda/tony/demo/test /grid/gilda/tony/aLink Symbolic Original File link Let’s check the link using lfc-ls with long listing (-l): > lfc-ls -l lrwxrwxrwx 1 19122 1077 0 Jun 14 11:58 aLink ->/grid/gilda/tony/demo/test drwxr-xrwx 1 19122 1077 0 Jun 14 11:39 demo INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 23
  31. 31. LFC commands Enabling Grids for E-sciencE Summary of the LFC Catalog commands lfc-chmod Change access mode of the LFC file/directory lfc-chown Change owner and group of the LFC file-directory lfc-delcomment Delete the comment associated with the file/directory lfc-getacl Get file/directory access control lists lfc-ln Make a symbolic link to a file/directory lfc-ls List file/directory entries in a directory lfc-mkdir Create a directory lfc-rename Rename a file/directory lfc-rm Remove a file/directory lfc-setacl Set file/directory access control lists lfc-setcomment Add/replace a comment INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 24
  32. 32. LFC C API Enabling Grids for E-sciencE Low level methods (many POSIX-like): lfc_setacl lfc_access lfc_deleteclass lfc_listreplica lfc_setatime lfc_aborttrans lfc_delreplica lfc_lstat lfc_setcomment lfc_addreplica lfc_endtrans lfc_mkdir lfc_seterrbuf lfc_apiinit lfc_enterclass lfc_modifyclass lfc_setfsize lfc_chclass lfc_errmsg lfc_opendir lfc_starttrans lfc_chdir lfc_getacl lfc_queryclass lfc_stat lfc_chmod lfc_getcomment lfc_readdir lfc_symlink lfc_chown lfc_getcwd lfc_readlink lfc_umask lfc_closedir lfc_getpath lfc_rename lfc_undelete lfc_creat lfc_lchown lfc_rewind lfc_unlink lfc_delcomment lfc_listclass lfc_rmdir lfc_utime lfc_delete lfc_listlinks lfc_selectsrvr send2lfc INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 25
  33. 33. GFAL: Grid File Access Library Enabling Grids for E-sciencE • Interactions with SE require some components: - File catalog services to locate replicas - SRM - File access mechanism to access files from the SE on the WN • GFAL does all this tasks for you: - Hides all these operations - Presents a POSIX interface for the I/O operations - Single shared library in threaded and unthreaded versions (libgfal.so, libgfal_pthr.so) - Single header file (gfal_api.h) - User can create all commands needed for storage management - It offers as well an interface to SRM • Supported protocols: - file (local or nfs-like access) - dcap, gsidcap and kdcap (dCache access) - rfio (castor access) and gsirfio (DPM) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 26
  34. 34. GFAL: File I/O API (I) Enabling Grids for E-sciencE int gfal_access (const char *path, int amode); int gfal_chmod (const char *path, mode_t mode); int gfal_close (int fd); int gfal_creat (const char *filename, mode_t mode); off_t gfal_lseek (int fd, off_t offset, int whence); int gfal_open (const char * filename, int flags, mode_t mode); ssize_t gfal_read (int fd, void *buf, size_t size); int gfal_rename (const char *old_name, const char *new_name); ssize_t gfal_setfilchg (int, const void *, size_t); int gfal_stat (const char *filename, struct stat *statbuf); int gfal_unlink (const char *filename); ssize_t gfal_write (int fd, const void *buf, size_t size); INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 27
  35. 35. GFAL: FC API (II) Enabling Grids for E-sciencE int gfal_closedir (DIR *dirp); int gfal_mkdir (const char *dirname, mode_t mode); DIR *gfal_opendir (const char *dirname); struct dirent *gfal_readdir (DIR *dirp); int gfal_rmdir (const char *dirname); INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 28
  36. 36. GFAL: Catalog API Enabling Grids for E-sciencE int create_alias (const char *guid, const char *lfn, long long size) int guid_exists (const char *guid) char *guidforpfn (const char *surl) char *guidfromlfn (const char *lfn) char **lfnsforguid (const char *guid) int register_alias (const char *guid, const char *lfn) int register_pfn (const char *guid, const char *surl) int setfilesize (const char *surl, long long size) char *surlfromguid (const char *guid) char **surlsfromguid (const char *guid) int unregister_alias (const char *guid, const char *lfn) int unregister_pfn (const char *guid, const char *surl) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 29
  37. 37. GFAL: SRM API Enabling Grids for E-sciencE int deletesurl (const char *surl) int getfilemd (const char *surl, struct stat64 *statbuf) int set_xfer_done (const char *surl, int reqid, int fileid, char *token, int oflag) int set_xfer_running (const char *surl, int reqid, int fileid, char *token) char *turlfromsurl (const char *surl, char **protocols, int oflag, int *reqid, int *fileid, char **token) int srm_get (int nbfiles, char **surls, int nbprotocols, char **protocols, int *reqid, char **token, struct srm_filestatus **filestatuses) int srm_getstatus (int nbfiles, char **surls, int reqid, char *token, struct srm_filestatus **filestatuses) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 30
  38. 38. GFAL Java API Enabling Grids for E-sciencE • GFAL API are available for C/C++ programmers • We wrote a wrapper around the C APIs using Java Native Interface and a the Java APIs on top of it • More information can be found here: https://grid.ct.infn.it/twiki/bin/view/GILDA/APIGFAL INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 31
  39. 39. lcg_utils DM tools Enabling Grids for E-sciencE • High level interface (CL tools and APIs) to – Upload/download files to/from the Grid (UI,CE and WN <---> SEs) – Replicate data between SEs and locate the best replica available – Interact with the file catalog • Definition: A file is considered to be a Grid File if it is both physically present in a SE and registered in the File Catalog • lcg-utils ensure the consistency between files in the Storage Elements and entries in the File Catalog INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 32
  40. 40. lcg-utils commands Enabling Grids for E-sciencE Replica Management lcg-cp Copies a grid file to a local destination lcg-cr Copies a file to a SE and registers the file in the catalog lcg-del Delete one file lcg-rep Replication between SEs and registration of the replica lcg-gt Gets the TURL for a given SURL and transfer protocol lcg-sd Sets file status to “Done” for a given SURL in a SRM request File Catalog Interaction lcg-aa Add an alias in LFC for a given GUID lcg-ra Remove an alias in LFC for a given GUID lcg-rf Registers in LFC a file placed in a SE lcg-uf Unregisters in LFC a file placed in a SE lcg-la Lists the alias for a given SURL, GUID or LFN lcg-lg Get the GUID for a given LFN or SURL lcg-lr Lists the replicas for a given GUID, SURL or LFN INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 33
  41. 41. Enabling Grids for E-sciencE LFC interfaces SE SE SE LCG UTIL GFAL Python LFC CLIENT LFC C API SERVER WMS DLI CLI lfc-ls, lfc-mkdir, lfc-setacl, … INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 34
  42. 42. Data movement introduction Enabling Grids for E-sciencE • Grids are naturally distributed systems • The means that data also needs to be distributed – First generation data distribution mainly concentrated on copy protocols in a grid environment:  gridftp  http + mod_gridsite • But copies controlled by clients have problems… INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  43. 43. Direct Client Controlled Enabling Grids for E-sciencE Data Movement Client Control Channels Source Storage Element Data Flow Channel Destination Storage Element • Although transport protocol may be robust, state is held inside client – inconvenient and fragile. • Client only knows about local state, no sense of global knowledge about data transfers between storage elements. – Storage elements overwhelmed with replication requests – Multiple replications of the same data can happen simultaneously – Site has little control over balance of network resources - DoS INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  44. 44. Transfer Service Enabling Grids for E-sciencE • Clear need for a service for data transfer •Submit new request •Monitor progress – Client connects to service Client •Cancel request to submit request – Service maintains state about transfer SOAP via https – Client can periodically reconnect to check status or cancel request Transfer – Service can have Service Control knowledge of global state, not just a single request  Load balancing  Scheduling Source Storage Destination Element Storage Data Element Flow INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  45. 45. gLite FTS: Channels Enabling Grids for E-sciencE • FTS Service has a concept of channels • A channel is a unidirectional connection between two sites • Transfer requests between these two sites are assigned to that channel • Channels usually correspond to a • Channels control certain dedicated network pipe (e.g., OPN) transfer properties: associated with production transfer concurrency, • But channels can also take gridftp streams. wildcards: – * to MY_SITE : All incoming • Channels can be – MY SITE to * : All outgoing controlled independently: started, – * to * : Catch all stopped, drained. INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  46. 46. Why Grid needs Metadata? Enabling Grids for E-sciencE • Grids allow to save millions of files spread over several storage sites. • Users and applications need an efficient mechanism – to describe files – to locate files based on their contents • This is achieved by – associating descriptive attributes to files  Metadata is data about data – answering user queries against the associated information INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  47. 47. Basic Metadata Concept Enabling Grids for E-sciencE • Entries – Representation of real world entities which we are attaching metadata to for describing them • Attribute – key/value pair – Type – The type (int, float, string,…) – Name/Key – The name of the attribute – Value - Value of an entry's attribute • Schema – A set of attributes • Collection – A set of entries associated with a schema • Metadata - List of attributes (including their values) associated with entries INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 40
  48. 48. Example: Movie Trailers Enabling Grids for E-sciencE • Movie trailers files (entries) saved on Grid Storage Elements and registered into File Catalogue • We want to add metadata to describe movie content. • A possible schema: – Title -- varchar – Runtime -- int – Cast -- varchar – LFN -- varchar • A metadata catalogue will be the repository of the movies’ metadata and will allow to find movies satisfying users’ queries INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  49. 49. Trailer example Enabling Grids for E-sciencE Entry names Title Ru Cast LFN 8c3315c1-811f-4823-a778-60a203439689 My Best nti Julia 80 lfn:/grid/gilda/movies/ Friend’s Roberts mybfwed.avi wedding me Kirsten 51a18b7a-fd21-4b2c-aa74-4c53ee64846a Spider-man 2 120 lfn:/grid/gilda/movies/ Dunst spiderman2.avi 401e6df4-c1be-4822-958c-ce3eb5c54fcb The God Father 113 Al pacino lfn:/grid/gilda/movies/ godfather.avi INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 42
  50. 50. Trailer example Enabling Grids for E-sciencE Attribute Entry names Title Ru Cast LFN 8c3315c1-811f-4823-a778-60a203439689 My Best nti Julia 80 lfn:/grid/gilda/movies/ Friend’s Roberts mybfwed.avi wedding me Kirsten 51a18b7a-fd21-4b2c-aa74-4c53ee64846a Spider-man 2 120 lfn:/grid/gilda/movies/ Dunst spiderman2.avi 401e6df4-c1be-4822-958c-ce3eb5c54fcb The God Father 113 Al pacino lfn:/grid/gilda/movies/ godfather.avi INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 42
  51. 51. Trailer example Enabling Grids for E-sciencE Schema Attribute Entry names Title Ru Cast LFN 8c3315c1-811f-4823-a778-60a203439689 My Best nti Julia 80 lfn:/grid/gilda/movies/ Friend’s Roberts mybfwed.avi wedding me Kirsten 51a18b7a-fd21-4b2c-aa74-4c53ee64846a Spider-man 2 120 lfn:/grid/gilda/movies/ Dunst spiderman2.avi 401e6df4-c1be-4822-958c-ce3eb5c54fcb The God Father 113 Al pacino lfn:/grid/gilda/movies/ godfather.avi INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 42
  52. 52. Trailer example Enabling Grids for E-sciencE Schema Attribute Entry names Title Ru Cast LFN 8c3315c1-811f-4823-a778-60a203439689 My Best nti Julia 80 lfn:/grid/gilda/movies/ Friend’s Roberts mybfwed.avi wedding me Kirsten 51a18b7a-fd21-4b2c-aa74-4c53ee64846a Spider-man 2 120 lfn:/grid/gilda/movies/ Dunst spiderman2.avi 401e6df4-c1be-4822-958c-ce3eb5c54fcb The God Father 113 Al pacino lfn:/grid/gilda/movies/ godfather.avi Entries INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 42
  53. 53. Trailer example Enabling Grids for E-sciencE Schema Attribute Entry names Title Ru Cast LFN 8c3315c1-811f-4823-a778-60a203439689 My Best nti Julia 80 lfn:/grid/gilda/movies/ Friend’s Roberts mybfwed.avi wedding me Kirsten 51a18b7a-fd21-4b2c-aa74-4c53ee64846a Spider-man 2 120 lfn:/grid/gilda/movies/ Dunst spiderman2.avi 401e6df4-c1be-4822-958c-ce3eb5c54fcb The God Father 113 Al pacino lfn:/grid/gilda/movies/ godfather.avi Collection /trailers Entries INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 42
  54. 54. Metadata service on the Grid Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 43
  55. 55. Metadata service on the Grid Enabling Grids for E-sciencE • Information about files -- but not only! INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 43
  56. 56. Metadata service on the Grid Enabling Grids for E-sciencE • Information about files -- but not only! • metadata can describe any grid entity/object – ex: JobIDs - add logging information to your jobs INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 43
  57. 57. Metadata service on the Grid Enabling Grids for E-sciencE • Information about files -- but not only! • metadata can describe any grid entity/object – ex: JobIDs - add logging information to your jobs • monitoring of running applications: – ex: ongoing results from running jobs can be published on the metadata server INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 43
  58. 58. Metadata service on the Grid Enabling Grids for E-sciencE • Information about files -- but not only! • metadata can describe any grid entity/object – ex: JobIDs - add logging information to your jobs • monitoring of running applications: – ex: ongoing results from running jobs can be published on the metadata server • Inputset for a storm of parametric jobs INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 43
  59. 59. Metadata service on the Grid Enabling Grids for E-sciencE • Information about files -- but not only! • metadata can describe any grid entity/object – ex: JobIDs - add logging information to your jobs • monitoring of running applications: – ex: ongoing results from running jobs can be published on the metadata server • Inputset for a storm of parametric jobs • information exchanging among grid peers – ex: producers/consumers job collections: master jobs produce data to be analyzed; slave jobs query the metadata server to retrieve input to “consume” INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 43
  60. 60. Metadata service on the Grid Enabling Grids for E-sciencE • Information about files -- but not only! • metadata can describe any grid entity/object – ex: JobIDs - add logging information to your jobs • monitoring of running applications: – ex: ongoing results from running jobs can be published on the metadata server • Inputset for a storm of parametric jobs • information exchanging among grid peers – ex: producers/consumers job collections: master jobs produce data to be analyzed; slave jobs query the metadata server to retrieve input to “consume” • Simplified DB access on the grid – Grid applications that needs structured data can model their data schemas as metadata INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 43
  61. 61. Monitoring of running application Enabling Grids for E-sciencE SE showing results as long as they are produced W N CE WN Metadata Catalogue /results collection WN Workload Manager Scientist/Developer Customer/ submitting jobs Scientist INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 44
  62. 62. Inputset for parametric jobs Enabling Grids for E-sciencE • /grid/my_simulation/input ---------------------------------------------------------------------------------------------------- |entry     |x1        |x2        |y1        |y2        |step      |isTaken   |found     |output    | |--------------------------------------------------------------------------------------------------| |1         |9453.1    |9453.32   |-439.93   |-439.91   |0.0006    |JobID1234 |No pillars|          | |2         |9342.13   |3435      |3423      |2343.2    |0.003     |No        |          |          | |3 |34254.3 |342342 |432.43 |132 |0.002 |No | | | | ...... and so on | ---------------------------------------------------------------------------------------------------- • This collection lists all the parameter set to be run on the Grid • On the WN, one of the inputset is selected and “isTaken” is set = JOB_ID of the job that has fetched it • Results is also written in the “found” column to monitor the simulation • so users can check the simulation from a UI, querying the metadata server, or from a WebPage (using APIs for ex) • StdOutput can be copied also into the “output” text column INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 45
  63. 63. A possible parameter-get.sh script Enabling Grids for E-sciencE #!/bin/bash # Find the first set of parameters that has not been taken by noone ID=`mdcli find /grid/my_simulation/input 'isTaken="No"' | head -1` # Exit if all the parameters set has been already analyzed if [ "$ID" = "" ]; then exit 1; fi # set isTaken as its JOB_ID so that no one else will analyze the same set of parameter mdcli setattr /grid/my_simulation/input/$ID isTaken `echo $GLITE_WMS_JOBID` # retrieve the set of the parameter to be scanned X1=`mdcli getattr /grid/my_simulation/input/$ID x1 | tail -1` Y1=`mdcli getattr /grid/my_simulation/input/$ID y1 | tail -1` X2=`mdcli getattr /grid/my_simulation/input/$ID x2 | tail -1` Y2=`mdcli getattr /grid/my_simulation/input/$ID y2 | tail -1` STEP=`mdcli getattr /grid/my_simulation/input/$ID step | tail -1` # Run the scan with the proper parameter and save the output to output.txt java -cp issgc_sfk_nesc.jar:sfkscanner.jar  uk.ac.nesc.toe.sfk.radar.Scanner $X1 $Y1 $X2 $Y2 $STEP > output.txt # the Scanner class returns the writing "No pillars found in this area" or "Found area:" - so this will give useful info for monitoring during the run mdcli setattr /grid/my_simulation/input/$ID found `cat output.txt | grep -i found` # save the output (and the pillar text if found) on the metadata server mdcli setattr /grid/my_simulation/input/$ID output `cat output.txt` INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 46
  64. 64. Use a Metadata services to exchange data Enabling Grids for E-sciencE among running jobs INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  65. 65. Use a Metadata services to exchange data Enabling Grids for E-sciencE among running jobs • Suppose we have two sets of jobs: – Producers: they generate a file, store on a SE, register it onto the LFC File Catalogue assigning a LFN – Consumers: they will take a LFN, download the file and elaborate it INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  66. 66. Use a Metadata services to exchange data Enabling Grids for E-sciencE among running jobs • Suppose we have two sets of jobs: – Producers: they generate a file, store on a SE, register it onto the LFC File Catalogue assigning a LFN – Consumers: they will take a LFN, download the file and elaborate it • A Metadata collection can be used to share the information generated by the Producers; it could act as a “bag-of-LFNs” (bag-of-task model) from which Consumers can fetch file for further elaboration INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  67. 67. Information exchanging among grid peers Enabling Grids for E-sciencE SE Producers jobs Consumers jobs put LFN W fetch LFN N W N CE WN Metadata WN Catalogue /bag-of-LFNs WN collection WN CE Workload Manager Scientist/Developer submitting jobs INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 48
  68. 68. The AMGA Metadata Catalogue Enabling Grids for E-sciencE • Official metadata service for the gLite middleware – but no dependencies from gLite software – it can be used with other grid technologies/other environments • AMGA: Arda Metadata Grid Application • Provide a complete but simple interface, in order to make all users able to use it easily. • Designed with scalability in mind in order to deal with large number of entries – based on a lightweight and streamed text-based protocol, like HTTP/SMTP • Grid security is provided to grant different access levels to different users. • Flexible with support to dynamic schemas in order to serve several application domains • Simple installation by tar source, RPMs or Yum/YAIM INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 49
  69. 69. AMGA Features Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  70. 70. AMGA Features Enabling Grids for E-sciencE • Dynamic Schemas – Schemas can be modified at runtime by client  Create, delete schemas  Add, remove attributes INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  71. 71. AMGA Features Enabling Grids for E-sciencE • Dynamic Schemas – Schemas can be modified at runtime by client  Create, delete schemas  Add, remove attributes • AMGA collections are hierarchical organized – Collections can contain sub-collections – Sub-collections can inherit/extend parent collection’ schema INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  72. 72. AMGA Features Enabling Grids for E-sciencE • Dynamic Schemas – Schemas can be modified at runtime by client  Create, delete schemas  Add, remove attributes • AMGA collections are hierarchical organized – Collections can contain sub-collections – Sub-collections can inherit/extend parent collection’ schema • Flexible Queries – SQL-like query language – Different join type (inner, outer, left, right) between schemas are provided selectattr /gLibrary:FileName /gLAudio:Author /gLAudio:Album '/gLibrary:FILE=/gLAudio:FILE and like(/gLibrary:FileName, “%.mp3")‘ INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  73. 73. AMGA Features Enabling Grids for E-sciencE • Dynamic Schemas – Schemas can be modified at runtime by client  Create, delete schemas  Add, remove attributes • AMGA collections are hierarchical organized – Collections can contain sub-collections – Sub-collections can inherit/extend parent collection’ schema • Flexible Queries – SQL-like query language – Different join type (inner, outer, left, right) between schemas are provided selectattr /gLibrary:FileName /gLAudio:Author /gLAudio:Album '/gLibrary:FILE=/gLAudio:FILE and like(/gLibrary:FileName, “%.mp3")‘  Support for Views, Constraints, Indexes INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  74. 74. Example Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  75. 75. AMGA Security Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 52
  76. 76. AMGA Security Enabling Grids for E-sciencE • Unix style permissions - users and groups INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 52
  77. 77. AMGA Security Enabling Grids for E-sciencE • Unix style permissions - users and groups • ACLs – Per-collection or per-entry (table row). INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 52
  78. 78. AMGA Security Enabling Grids for E-sciencE • Unix style permissions - users and groups • ACLs – Per-collection or per-entry (table row). • Secure client/server connections – SSL INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 52
  79. 79. AMGA Security Enabling Grids for E-sciencE • Unix style permissions - users and groups • ACLs – Per-collection or per-entry (table row). • Secure client/server connections – SSL • Client Authentication based on – Username/password – General X509 certificates (DN based) – Grid-proxy certificates (DN based) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 52
  80. 80. AMGA Security Enabling Grids for E-sciencE • Unix style permissions - users and groups • ACLs – Per-collection or per-entry (table row). • Secure client/server connections – SSL • Client Authentication based on – Username/password – General X509 certificates (DN based) – Grid-proxy certificates (DN based) • VOMS support: – VO attribute maps to defined AMGA user – VOMS Role maps to defined AMGA user – VOMS Group maps to defined AMGA group INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 52
  81. 81. AMGA Implementation Enabling Grids for E-sciencE • C++ multiprocess server – Backends  Oracle, MySQL 4/5, PostgreSQL, SQLite – Front Ends TCP text streaming • High performance • Client API for C++, Java, Python, Perl, PHP SOAP (deprecated) • Interoperability • AMGA server runs on SLC3/4, • Scalability Fedora Core, Gentoo, Debian WS-DAIR Interface (new in AMGA 2.0) • WS-enable environment • Standalone Python Library implementation – Data stored on file system INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  82. 82. AMGA Datatypes Enabling Grids for E-sciencE - Using the above datatypes you are sure that your metadata can be easily moved to all supported back- ends - If you do not care about DB portability, you can use, in principle, as entry attribute type ALL the datatypes supported by the back-end, even the more esoteric ones (PostgreSQL Network Address type or Geometric ones) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  83. 83. Accessing AMGA from UI/WNs Enabling Grids for E-sciencE • TCP Streaming Front-end – mdcli & mdclient CLI and C++ API (md_cli.h, MD_Client.h) – Java Client API and command line mdjavaclient.sh & mdjavacli.sh (also under Windows !!) – Python and Perl Client API – PHP Client API – NEW  developed totally by the GILDA team – INFN CT – AMGA Web Interface (AMGA WI) ---NEW  Developed totally by the GILDA team – INFN CT  Based on JAVA AMGA Standard APIs  Web Application using standard as JSP Custom Tags, Servlet • SOAP Frontend (WSDL) – C++ gSOAP – AXIS (Java) – ZSI (Python) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  84. 84. Advanced features: Metadata Replication Enabling Grids for E-sciencE • AMGA provides a replication/federation mechanisms • Motivation – Scalability – Support hundreds/thousands of concurrent users – Geographical distribution – Hide network latency – Reliability – No single point of failure – DB Independent replication – Heterogeneous DB systems – Disconnected computing – Off-line access (laptops) • Architecture – Asynchronous replication – Master-slave – writes only allowed on the master – Application level replication  Replicate Metadata commands – Partial replication – supports replication of only sub-trees of the metadata hierarchy INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  85. 85. Metadata Replication: Use cases Enabling Grids for E-sciencE Full replication Partial replication Federation Proxy INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  86. 86. Existing SQL DBs importing Enabling Grids for E-sciencE • Since AMGA 1.2.10, a new import feature allow to access existing DB table • Once imported into AMGA the tables from one or more DBs you want to access through AMGA, you can exploit many of the features brought to you by AMGA for your existing tables • Advantages: – your db tables can be accessed by grid users/applications, using grid authentication (VOMS proxies)/authorization with ACLs – exploiting AMGA federation features you can access several databases together from the Grid INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 58
  87. 87. Set up AMGA to access your tables Enabling Grids for E-sciencE • To remember: AMGA stores its own tables in its DB backend • To access and existing DB you have 2 option:  import the tables of the DB you want to access to into AMGA DB backend  viceversa, add AMGA DB backed tables to the DB you want to access to • Use the import command by root to “mount” you table into the AMGA collection hierarchy Query> whoami >> root Query> createdir /world Query> cd /world/ Query> import world.City /world/City Query> import world.Country /world/Country Query> import world.CountryLanguage /world/CountryLanguage INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 59
  88. 88. Set up AMGA to access your tables Enabling Grids for E-sciencE • Properly set up authorization on the imported tables: Query> acl_remove /world/City/ system:anyuser Query> acl_remove /world/Country system:anyuser Query> acl_add /world/ gilda:users rx Query> acl_show /world >> root rwx >> gilda:users rx >> system:anyuser rx Query> selectattr City:CountryCode City:Name 'like(City:Name, "Am%") limit 5' >> NLD >> Amsterdam >> NLD >> Amersfoort >> BRA >> Americana >> ECU >> Ambato >> IDN ‣ More information on existing DB access @: ‣ http://amga.web.cern.ch/amga/importing.html ‣ https://grid.ct.infn.it/twiki/bin/view/GILDA/AMGADBaccess INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 60
  89. 89. Native SQL Support Enabling Grids for E-sciencE • Objective: – implement native SQL query processing functionality in AMGA • Current Status: – direct SQL data statement in SQL92 Entry Level has been implemented in the 1.9 release  Including 4 statements: SELECT, DELETE, UPDATE and INSERT  ALL SQL commands should be issued in UPPERCASE • Entry name: – when a new entry is created with addentry/addentries, a name has to be assigned (filling the “file” column in the AMGA db backend)  in the INSERT implementation, it’s filled automatically with a random guid INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 61
  90. 90. Enabling Grids for E-sciencE Native SQL example Query> INSERT INTO `City` VALUES (1,'Kabul','AFG','Kabol',1780000) >> Operation Success Query> dir /world/City/ >> /world/City/80b4fe646ed11dda02100304873049 >> entry Query> SELECT COUNT (*) FROM /world/City >> 3429 Query> SELECT * FROM /world/City WHERE Name LIKE '%Catani%' >> 1472 >> Catania >> ITA >> Sisilia >> 337862 Query> SELECT /world/City:Name, /world/City:District, /world/Country:Name, / world/Country:Region, /world/Country:Continent FROM /world/City, /world/Country WHERE /world/City:Name LIKE '%Catani%' AND Code = 'ITA' >> Catania >> Sisilia >> Italy >> Southern Europe >> Europe INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 62
  91. 91. Biomed - MDM Enabling Grids for E-sciencE • Medical Data Manager – MDM – Store and access medical images and associated metadata on the Grid – Built on top of gLite 1.5 data management system – Demonstrated at last EGEE conference (October 05, Pisa) • Strong security requirements – Patient data is sensitive – Data must be encrypted – Metadata access must be restricted to authorized users • AMGA used as metadata server – Demonstrates authentication and encrypted access – Used as a simplified DB • More details at – https://uimon.cern.ch/twiki/bin/view/EGEE/DMEncryptedStorage INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  92. 92. gMOD: grid Movie On Demand Enabling Grids for E-sciencE • gMOD provides a Video-On-Demand service • User chooses among a list of video and the chosen one is streamed in real time to the video client of the user’s workstation • For each movie a lot of details (Title, Runtime, Country, Release Date, Genre, Director, Case, Plot Outline) are stored and users can search a particular movie querying on one or more attributes • Two kind of users can interact with gMOD: TrailersManagers that can administer the db of movies (uploading new ones and attaching metadata to them); GILDA VO users (guest) can browse, search and choose a movie to be streamed. INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  93. 93. gMOD under the hood Enabling Grids for E-sciencE • Built on top of gLite services: • Storage Elements, sited in different place, physically contain the movie files • LFC, the File Catalogue, keeps track in which Storage Element a particular movie is located • AMGA is the repository of the detailed information for each movie, and makes possible queries on them • The Virtual Organization Membership Service (VOMS) is used to assign the right role to the different users • The Workload Management System (WMS) is responsible to retrieve the chosen movie from the right Storage Element and stream it over the network down to the user’s desktop or laptop INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  94. 94. gMOD interactions Enabling Grids for E-sciencE VOMS Metadata Catalogue Storage Elements GENIUS Portal AMGA get Role LFC File Catalogue User Workload Management System W CE N W N WN INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  95. 95. gMOD screenshot Enabling Grids for E-sciencE gMOD is accesible through the Genius Portal (https://glite-demo.ct.infn.it) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  96. 96. gLibrary features Enabling Grids for E-sciencE • INFN-developed tool totally gLite based • It allows to store, organize, search and retrieve digital assets on a Grid environment with an intuitive front-end • What we mean by Digital Assets: INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 68
  97. 97. gLibrary as the iTunes for the Grid Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 69
  98. 98. Browse & Search Enabling Grids for E-sciencE • Assets can be browsed selecting a type (or category) and selecting one or more filters: – attributes of the selected types, chosen from a defined list, used to narrow the result set • Filter application is cascading and context-sensitive: the selection of a filter value dynamically influences subsequent filter values (“à la iTunes” browsing) – Classical search by description and keywords available too INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 70
  99. 99. Organize assets Enabling Grids for E-sciencE • “Types” and “Categories” definition by repository providers/admins: • and/or organized • Assets are organized by by collection: type: – Group together related – a list of specific attributes to assets of different type; describe each kind of asset to – Useful also to define be managed by the system subsets of assets – hierarchical (a child type shares belonging to the same and extend parent’s attributes) type – queried during searches – Multiple category assignment per asset (tagging) Collections INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 71
  100. 100. Store & Retrieve Enabling Grids for E-sciencE • Users can upload their local assets on one or more (creating replicas) Storage Elements of the Grid - Files already on grid SE can be registered in a gLibrary repository by the LFC File Catalogue browser • Download from SEs to the users’ laptop/desktop: - selection of a replica link from a list • Transfers are handled from the browser over HTTP/ HTTPS provided that users have their own X.509 Grid Certificate imported INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 72
  101. 101. gLibrary architecture Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 73
  102. 102. Features Enabling Grids for E-sciencE • Implemented as Web 2.0 application – AJAX and Javascript are strongly used to offer a desktop like user experience – Business logic implemented using PHP 5 OOP support EGEE-II INFSO-RI-031688 Antonio Calanducci ISGC09 Grid Tutorial, Taipei 18-04-2009 74
  103. 103. Technologies used Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 75
  104. 104. Technologies used Enabling Grids for E-sciencE • Web standards: – Javascript/AJAX/JSON on the client side – PHP5 classes to implement business logic on the server side INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 75
  105. 105. Technologies used Enabling Grids for E-sciencE • Web standards: – Javascript/AJAX/JSON on the client side – PHP5 classes to implement business logic on the server side INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 75
  106. 106. Technologies used Enabling Grids for E-sciencE • Web standards: – Javascript/AJAX/JSON on the client side – PHP5 classes to implement business logic on the server side • Grid technologies: – Storage Element SRM interface to get the TURLs (Transfer URLs) – Transfers handled with GridFTP (first release) and X.509 cert auth HTTPS – X.509 based Globus Security Infrastructure with the VOMS extensions to handle authentication and authorization (ACL based) on Metadata and Storage Elements – All grid services implemented with the EGEE gLite middleware (DPM Storage Elements, AMGA Metadata Catalogue, LFC File Catalogue, VOMS Services) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 75
  107. 107. De Roberto cultural heritage Enabling Grids for E-sciencE • De Roberto, an Italian writer of the XIX/XX century, born in Naples, but spending his life in Catania, has left to the humanistic community numerous works • Those are made up of valuable and hard-to-manage pieces: manuscripts, typescripts, drafts with handwritten corrections, magazines, cuts, sketches, photos, etc. INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 76
  108. 108. Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 77
  109. 109. Digitalize to preserve them Enabling Grids for E-sciencE • Some sheets are damaged (mold, crumbed pieces) and need physical restoration • Digitalization to avoid the loss of this works, some of them still unpublished and relevant for the humanistic communities INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 78
  110. 110. Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 79
  111. 111. Acquisition stage Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 80
  112. 112. Acquisition stage Enabling Grids for E-sciencE • Digitalization of manuscripts, typescripts, printed works – TIFF Files, one per page, 600 dpi, about 100MB for A3  High resolution scans for in-depth examination – PDF, one per work, 300 dpi, varying file sizes 40-400MB  Overall examination of works – 8000 sheets/scans, 3 Terabyte of disk space – Different physical formats, A3/A4/custom size INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 80
  113. 113. Acquisition stage Enabling Grids for E-sciencE • Digitalization of manuscripts, typescripts, printed works – TIFF Files, one per page, 600 dpi, about 100MB for A3  High resolution scans for in-depth examination – PDF, one per work, 300 dpi, varying file sizes 40-400MB  Overall examination of works – 8000 sheets/scans, 3 Terabyte of disk space – Different physical formats, A3/A4/custom size • Embedded Metadata – TIFF with embedded metadata to provide scan physical features and information about the content  ImageWidth, ImageHeight, XResolution, FileSize, CreationDate, ModifyDate  Description, Keywords, CaptionWriter, Title, Author, Copyright Status, Copyright Notice – Added with Photoshop after the digitalization phase (Adobe XMP format) INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 80
  114. 114. Goals and requirements Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 81
  115. 115. Goals and requirements Enabling Grids for E-sciencE • Make those works accessible to the humanistic communities – Always on-line: 24 x 365 – Available from everywhere – Simple and easy-to-use interface for non-expert people INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 81
  116. 116. Goals and requirements Enabling Grids for E-sciencE • Make those works accessible to the humanistic communities – Always on-line: 24 x 365 – Available from everywhere – Simple and easy-to-use interface for non-expert people • Quickly find the desired document – Document organization according the physical and semantic metadata Organization by type/collections Dynamic filtering of search result sets according the selection of one or more document metadata INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 81
  117. 117. Goals and requirements Enabling Grids for E-sciencE • Make those works accessible to the humanistic communities – Always on-line: 24 x 365 – Available from everywhere – Simple and easy-to-use interface for non-expert people • Quickly find the desired document – Document organization according the physical and semantic metadata Organization by type/collections Dynamic filtering of search result sets according the selection of one or more document metadata • Long-term preservation (digital preservation) – Multiple copies (replicas) spread in different geographical sites – Reliability of storage systems and replica redundancy to achieve secure preservation INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 81
  118. 118. What Data Grids can offer to them Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 82
  119. 119. What Data Grids can offer to them Enabling Grids for E-sciencE • store the 8000 scans of De Roberto Heritage ----> Data Grid Storage Elements INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 82
  120. 120. What Data Grids can offer to them Enabling Grids for E-sciencE • store the 8000 scans of De Roberto Heritage ----> Data Grid Storage Elements • enable an ubiquitous and 24/24h access to scientists ---> Web Application INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 82
  121. 121. What Data Grids can offer to them Enabling Grids for E-sciencE • store the 8000 scans of De Roberto Heritage ----> Data Grid Storage Elements • enable an ubiquitous and 24/24h access to scientists ---> Web Application • document organization for a quick search ---> Metadata Services INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 82
  122. 122. What Data Grids can offer to them Enabling Grids for E-sciencE • store the 8000 scans of De Roberto Heritage ----> Data Grid Storage Elements • enable an ubiquitous and 24/24h access to scientists ---> Web Application • document organization for a quick search ---> Metadata Services • long-term digital preservation of data ---> redundancy through Replicas of files on several Storage Elements INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 82
  123. 123. What Data Grids can offer to them Enabling Grids for E-sciencE • store the 8000 scans of De Roberto Heritage ----> Data Grid Storage Elements • enable an ubiquitous and 24/24h access to scientists ---> Web Application • document organization for a quick search ---> Metadata Services • long-term digital preservation of data ---> redundancy through Replicas of files on several Storage Elements • simple and easy-to-use system for searches, organization, upload and download of digitalized documents on the Grid INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 82
  124. 124. What Data Grids can offer to them Enabling Grids for E-sciencE • store the 8000 scans of De Roberto Heritage ----> Data Grid Storage Elements • enable an ubiquitous and 24/24h access to scientists ---> Web Application • document organization for a quick search ---> Metadata Services • long-term digital preservation of data ---> redundancy through Replicas of files on several Storage Elements • simple and easy-to-use system for searches, organization, upload and download of digitalized documents on the Grid -----> INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 82
  125. 125. What Data Grids can offer to them Enabling Grids for E-sciencE • store the 8000 scans of De Roberto Heritage ----> Data Grid Storage Elements • enable an ubiquitous and 24/24h access to scientists ---> Web Application • document organization for a quick search ---> Metadata Services • long-term digital preservation of data ---> redundancy through Replicas of files on several Storage Elements • simple and easy-to-use system for searches, organization, upload and download of digitalized documents on the Grid -----> INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 82
  126. 126. Data Grids to preserve Cultural Heritage Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 83
  127. 127. Live Demo Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 84
  128. 128. Data Management Services Summary Enabling Grids for E-sciencE • Storage Element – save date and provide a common interface – Storage Resource Manager (SRM) dCache, DPM, ... – Native Access protocols rfio, dcap, nfs, … – Transfer protocols gsiftp, https, … • Catalogs – keep track where data are stored – File Catalog LCG File Catalog (LFC) – Replica Catalog – Metadata Catalog AMGA Metadata Catalogue • Data Movement – schedules reliable file transfer – File Transfer Service gLite FTS (manages physical transfers) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 85
  129. 129. References Enabling Grids for E-sciencE • gLite documentation homepage – http://glite.web.cern.ch/glite/documentation/default.asp • DM subsystem documentation – http://egee-jra1-dm.web.cern.ch/egee-jra1-dm/doc.htm • LFC and DPM documentation – https://uimon.cern.ch/twiki/bin/view/LCG/ DataManagementDocumentation INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 86
  130. 130. References Enabling Grids for E-sciencE • AMGA Web Site http://cern.ch/amga • AMGA Manual http://amga.web.cern.ch/amga/downloads/amga-manual_1_3_0.pdf • AMGA API Javadoc http://amga.web.cern.ch/amga/javadoc/index.html • AMGA Web Frontend http://gilda-forge.ct.infn.it/projects/amgawi/ • AMGA Basic Tutorial https://grid.ct.infn.it/twiki/bin/view/GILDA/AMGAHandsOn • More information on existing DB access @: – http://amga.web.cern.ch/amga/importing.html – https://grid.ct.infn.it/twiki/bin/view/GILDA/AMGADBaccess INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 87
  131. 131. gLibrary references Enabling Grids for E-sciencE • gLibray project homepage: – https://glibrary.ct.infn.it/ • gLibrary paper: – https://glibrary.ct.infn.it/glibrary/downloads/gLibrary_paper_v2.pdf EGEE-II INFSO-RI-031688 Antonio Calanducci ISGC09 Grid Tutorial, Taipei 18-04-2009 88
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×