gLite Information System overview Leandro N. Ciuffo INFN-Catania (Italy) EELA-2 Workshop at the Open Access 2008 Lilongwe - Malawi, 13.11.2008
Motivation / Introduction
Information system in practice
Users can access electrical power coming from different (and heterogeneous) sources
The power Grid paradigm Electric Power Grid System coal wind hydroelectric nuclear
Users can access storage and computing resources coming from different (and heterogeneous) sources
The computing Grid approach Computing Grid System The resources shared within the Grid can be physical objects (CPUs, storage devices) or logical resources (computing queues, distributed file systems)
What resources are available to the Grid?
What is their current status?
What is out there? ? ? ?
Discover new added resources
Make available updated information about the resources
Monitor resources load and status
How does it work?
Collecting information from the resources and publishing fresh data periodically.
Adopting a data model that MUST be well known to all Grid components.
The Info. system scope
Retrieve information about resources:
Where can I run my job?
Where can I copy my files?
Which software packages are available on a given CE?
Publish information about the resources and services they provide
WMS: matching job requirements and allocating the resources
Monitoring Services: retrieving information about the status and availability of resources .
How it is used and by whom?
Stands for “Grid Laboratory Uniform Environment”
Provides a standardized description of a Grid computing system
It is not tied to any particular implementation
The Grid components are represented as objects which have attributes and relations to other objects
GLUE Schema - Overview
The Grid is a paradigm of distributed computing that enables the coordination of resources and services not subject to centralized control . These resources are geographically dispersed , span multiple trust domains and are heterogeneous .
Resources can be dynamically contributed by different owner institutions, so a precise and shared description of resources among information consumers and resource providers is necessary .
This description should also be common to different Grid infrastructures in order to contribute to the interoperability among them.
GLUE Schema – Why we need it?
GLUE Schema – Core entities
Site entity – Properties:
GLUE Schema – Site properties This attribute is to be used to publish info that does not fit in any other attribute of the site entity string OtherInfo VO sponsoring the site; the syntax should allow the expression of the percentage of sponsorship string Sponsor The URI identifying a web page with more information about this site uri Web The position of a place east or west of Greenwich, England measured from -180º to 180º with positive values going east and negative values going west real32 Longitude The position of a place north or south of the equator measured from -90º to 90º, with positive values going north and negative values going south real32 Latitude Geographical location of this site (e.g., city, state, country) string Location E-mail addresses of the security manager string SecurityContact E-mail addresses of the system administrator string SysAdminContact E-mail addresses of the support service string UserSupportContact The main email contact for the site. Syntax rule: "mailto:" followed by a list of email addresses separated by a comma string EmailContact Short description of this site string Description Human-readable name string Name Unique Identifier of the Site string UniqueID Description Type Property
Service entity – Properties:
GLUE Schema – Service properties Authorization rule for this entity ACL_t AccessControlBase.Rule Owner of the service string Owner The timestamp related to last start time of this service dateTime_xs_t StartTime URL of detailed description uri Semantics URI of the WSDL describing the service uri WSDL Textual explanation for the status of the service string StatusInfo Status of the service. String enumeration: OK, Warning, Critical, Unknown, Other serviceStatus_t Status Network endpoint for this service uri Endpoint Version of the service: <major version number>.<minor version number>.<patch version number> string Version The service type serviceType_t Type Human-friendly name string Name Unique Identifier of this service string UniqueID Description Type Property
GLUE Schema – Cluster entity
GLUE Schema – CE entity
Stands for “Lightweight Directory Access Protocol”
It is a protocol that defines the method by which directory data is accessed
Optimized for reading, browsing and searching information (‘ write-once-read-many-times’ service)
Data is represented as a hierarchy of objects (entities) forming a tree structure
Data Information Tree (DIT)
LDAP - Overview
Attribute Type String representation
Distinguished Name (DN)
Unique name (path) that unambiguously identifies a single entry
LDAP – Data Information Tree dc =grid (root of the DIT) c = Brazil c =Italy c =Spain o =INFN ou =Catania ou =Roma cn =Leandro Ciuffo cn=Leandro Ciuffo,ou=Catania,o=INFN,c=Italy,dc=grid DC domainComponent CN CommonName C CountryName OU OrganizationUnitName O OrganizationName String Attribute Type
The tree hierarchy is described by textual files following the LDIF format (LDAP Data Interchange Files)
LDAP – Data Information Tree dn: dc=example,dc=root dc: example description: My company objectClass: dcObject objectClass: organization o: Example, Inc. ## FIRST Level hierarchy - people # this is an ENTRY sequence and is preceded by a BLANK line dn: ou=people, dc=example,dc=root ou: people description: All people in organisation objectClass: organizationalUnit ## SECOND Level hierarchy - people entries # this is an ENTRY sequence and is preceded by a BLANK line dn: cn=Joe Schmo,ou=people,dc=example,dc=root objectclass: inetOrgPerson cn: Joe Schmo sn: Schmo uid: jschmo mail: firstname.lastname@example.org ou: sales
An individual LDAP server might not store the entire DIT
Servers need to be linked together in order to form a distributed directory that contains the whole DIT
All client requests start at the global directory LDAP 1
LDAP – Referrals dc =example c = USA c =USA ,dc =example o =CISCO o =IBM LDAP 1 server LDAP 2 server o =IBM ,c =USA ,dc =example ou =IT ou =Web LDAP 3 server referral referral
gLite adopts the Globus Monitoring and Discovery Service ( MDS ) architecture as it’s Information System
It implements the GLUE Schema using Open LDAP
Grid Resource Information Server ( GRIS )
LDAP server at a site level
Local CEs and SEs run an Information Provider software, which collects info about the resource
Berkeley Database Information Index ( BDII )
Top level database
Store and publish data gathered from the local GRISes
Uses LDAP as protocol and GLUE Schema as data model
MDS - Architecture Site level BDII Grid Index Information Server (GIIS)
BDII consists of 2 servers: one contains a ready-only database and the other a write-only database
Every 2 minutes a cron-job runs a script and collects info from lower-level servers (GIIS)
Once updated, the 2 servers change roles
MDS - Architecture
Exploring the GILDA top BDII: glite-rb.ct.infn.it (port: 2170 )
LDAP browser showing BDII info
The default BDII is defined in the environment variable
Querying a BDII server The current EELA BDII is: lnx112.eela.if.ufrj.br
– x option indicates that simple authentication should be used;
– h and –p option precede the hostname and port respectively;
– b option is used to specify the initial entry for the search in the LDAP tree
ldapsearch -x -h <BDII to query> -p 2170 -b mds-vo-name=local,o=grid
Perl script wrapping a set of LDAP commands
Avoid the need of executing raw LDAP queries
Allow users to retrieve information about Grid resources
lcg-infosites options ce The information related to number of CPUs, running jobs, waiting jobs and names of the CEs are provided. -v 1 only the names of the queues will be printed. -v 2 The RAM Memory together with the operating system and its version and the processor included in each CE are printed. se The names of the SEs, the used and available space will be printed. -v 1 only the names of the SEs will be printed. closeSE The names of the CEs where the user's VO is allowed to run together with their corresponding closest SEs are provided lfc The name of the machine hosting the LFC catalog is displayed. tag The names of the tags relative to the software installed in site is printed together with the corresponding CE. all It groups together the information provided by ce, se, lrc and rmc. --is If not specified the BDII defined in default by the variable LCG_GFAL_INFOSYS will be queries. However the user may want to query any other BDII without redefining this environment variable. This is possible specifying this argument followed by the name of the BDII which the user wants to query. All options admits this argument.
lcg-infosites examples lcg-infosites --vo gilda se Avail Space(Kb) Used Space(Kb) Type SEs ---------------------------------------------------------- 71630000 8497495 n.a egee016.cnaf.infn.it 316420000 54806061 n.a aliserv6.ct.infn.it 30510000 3198673 n.a gilda-02.pd.infn.it 63290000 428553 n.a iceage-se-01.ct.infn.it
lcg-info options --help Prints the manual page and exits. --list-attrs Prints a list of the attributes that can be queried. --list-ce Lists the CEs which satisfy a query, or all the CEs if no query is given. --list-se Lists the SEs which satisfy a query, or all the SEs if no query is given. --bdii Allows to specify a BDII in the form :. If not given, the value of the environmental variable #LCG_GFAL_INFOSYS# is used. If that is not defined, the command returns an error. --sed Prints the output in a "sed-friendly" format: "%" separate the CE (SE) identifier and the printed attributes, "&" separate the values of multi-valued attributes. . --quiet Suppresses warning messages. --attrs Specifies the attributes whose values should be printed. --vo Restricts the output to CEs or SEs where the given VO is autho-rized. Mandatory when VO-dependent attributes are queried upon.