Enabling Grids for E-sciencE




                           Distributed Data and gLite

                           Steven Newhouse
                           Technical Director
                           CERN




www.eu-egee.org


EGEE-III INFSO-RI-222667                               EGEE and gLite are registered trademarks
The Data Deluge
                           Enabling Grids for E-sciencE



    •   Astronomy
    •   Genomics
    •   Earth Observation
    •   Digitisation




                                                                   Crab Nebula




                                                          X-ray               Optical
EGEE-III INFSO-RI-222667                                             Data Day - Grid School 2009   2
... And the LHC
                           Enabling Grids for E-sciencE




                                                   X      X

EGEE-III INFSO-RI-222667                                      Data Day - Grid School 2009   3
High throughput data analysis
                           Enabling Grids for E-sciencE



    • Analysing the data
          – Large ensemble calculations (100’s10,000’s jobs)
          – Complex workflows – dependent on previous steps
    • High Throughput
          – Exploit distributed computing and storage resources
                 Data replicated (multiple locations)
                 Resources selected through a broker
                     • WMS: Workload Management System
                     • Higher level tools: GANGA, DIANE, ...
          – Information system records the available resources
          – File catalogue records the location of replicated files
    • Data stored in files
          – Growing interest in relational data access
          – Stored on tape (long-term) or disk (immediate access)
EGEE-III INFSO-RI-222667                                       Data Day - Grid School 2009   4
Project Overview
                           Enabling Grids for E-sciencE


17000 users
136000 LCPUs (cores)
25Pb disk
39Pb tape

12 million jobs/month
    +45% in a year
268 sites
    +5% in a year
48 countries
    +10% in a year
162 VOs
    +29% in a year



EGEE-III INFSO-RI-222667                                     Data Day - Grid School 2009   5
So what does EGEE actually do?
                           Enabling Grids for E-sciencE



    • Builds and supports user communities on the grid

                                                          Application             User
                           Training
                                                           Porting               Support


    • Integrates and provides a worldwide infrastructure

      Software                       Integration,
                                        Test &                      Deployment                   Operations
     Development                     Certification

    • Collaboration and Technical Leadership worldwide

                     Collaborating
                                                             Standards                      Policy
                       Projects


EGEE-III INFSO-RI-222667             Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009   6
Supporting Science
                           Enabling Grids for E-sciencE


•    Archeology
          End-user activity              Resource Utilisation
•    Astronomy
••    13,000 end-users in 112 VOs Computational
     Astrophysics                        Chemistry
•      • Protection
     Civil +44% users in a year         Life Sciences

•    Comp. Chemistry                  Multidisciplinary
•    Earth Sciences                      Astronomy &

•    Finance                             Astrophysics

                                        Earth Science
•    Fusion
•    Geophysics                                 Fusion


•    High Energy Physics                 Other Areas

•    Life Sciences                                      0    1   2     3      4     5      6     7

•    Multimedia                  March 2008 to February 2009 (%) March 2007 to February 2008 (%)

•    Material Sciences                Proportion of HEP usage ~77%




EGEE-III INFSO-RI-222667                                              Data Day - Grid School 2009    7
Connecting Users to Resources
                           Enabling Grids for E-sciencE




                                                          Applications


                                                          Middleware

                                                Physical Resources




               Computers                                     Disks              Tape
EGEE-III INFSO-RI-222667                                                 Data Day - Grid School 2009   8
gLite Middleware
                                    Enabling Grids for E-sciencE


          EGEE Maintained Components Access
                                  User                                           External Components
                                                                                 User Interface
                                                                                 User Interface

                                            General Services                                      Virtual
                                   Workload       Logging &                                    Organisation
          BDII                    Management    Book keeping                       Hydra       Membership
                                    Service         Service                                      Service
           Information Services




                                  File Transfer                    LHC File                    Proxy Server
                                                                                  AMGA
                                     Service                       Catalogue
                                                                                                  Security
                                    Compute Element                            Storage            Services
                                                                               Element
                                                                                                    SCAS
                                  CREAM                LCG-CE                  Disk Pool
                                                                                              Authz. Service
                                                                               Manager
          MON                                 BLAH                                                 LCAS &
                                                                               dCache             LCMAPS
                                  gLExec          Worker Node


                                                         Physical Resources
EGEE-III INFSO-RI-222667                                                                   Data Day - Grid School 2009   9
Enabling Grids for E-sciencE



    • Contact:
          – steven.newhouse@cern.ch




EGEE-III INFSO-RI-222667                                  Data Day - Grid School 2009   10

Session 23 - Intro to EGEE-III

  • 1.
    Enabling Grids forE-sciencE Distributed Data and gLite Steven Newhouse Technical Director CERN www.eu-egee.org EGEE-III INFSO-RI-222667 EGEE and gLite are registered trademarks
  • 2.
    The Data Deluge Enabling Grids for E-sciencE • Astronomy • Genomics • Earth Observation • Digitisation Crab Nebula X-ray Optical EGEE-III INFSO-RI-222667 Data Day - Grid School 2009 2
  • 3.
    ... And theLHC Enabling Grids for E-sciencE X X EGEE-III INFSO-RI-222667 Data Day - Grid School 2009 3
  • 4.
    High throughput dataanalysis Enabling Grids for E-sciencE • Analysing the data – Large ensemble calculations (100’s10,000’s jobs) – Complex workflows – dependent on previous steps • High Throughput – Exploit distributed computing and storage resources  Data replicated (multiple locations)  Resources selected through a broker • WMS: Workload Management System • Higher level tools: GANGA, DIANE, ... – Information system records the available resources – File catalogue records the location of replicated files • Data stored in files – Growing interest in relational data access – Stored on tape (long-term) or disk (immediate access) EGEE-III INFSO-RI-222667 Data Day - Grid School 2009 4
  • 5.
    Project Overview Enabling Grids for E-sciencE 17000 users 136000 LCPUs (cores) 25Pb disk 39Pb tape 12 million jobs/month +45% in a year 268 sites +5% in a year 48 countries +10% in a year 162 VOs +29% in a year EGEE-III INFSO-RI-222667 Data Day - Grid School 2009 5
  • 6.
    So what doesEGEE actually do? Enabling Grids for E-sciencE • Builds and supports user communities on the grid Application User Training Porting Support • Integrates and provides a worldwide infrastructure Software Integration, Test & Deployment Operations Development Certification • Collaboration and Technical Leadership worldwide Collaborating Standards Policy Projects EGEE-III INFSO-RI-222667 Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 6
  • 7.
    Supporting Science Enabling Grids for E-sciencE • Archeology End-user activity Resource Utilisation • Astronomy •• 13,000 end-users in 112 VOs Computational Astrophysics Chemistry • • Protection Civil +44% users in a year Life Sciences • Comp. Chemistry Multidisciplinary • Earth Sciences Astronomy & • Finance Astrophysics Earth Science • Fusion • Geophysics Fusion • High Energy Physics Other Areas • Life Sciences 0 1 2 3 4 5 6 7 • Multimedia March 2008 to February 2009 (%) March 2007 to February 2008 (%) • Material Sciences Proportion of HEP usage ~77% EGEE-III INFSO-RI-222667 Data Day - Grid School 2009 7
  • 8.
    Connecting Users toResources Enabling Grids for E-sciencE Applications Middleware Physical Resources Computers Disks Tape EGEE-III INFSO-RI-222667 Data Day - Grid School 2009 8
  • 9.
    gLite Middleware Enabling Grids for E-sciencE EGEE Maintained Components Access User External Components User Interface User Interface General Services Virtual Workload Logging & Organisation BDII Management Book keeping Hydra Membership Service Service Service Information Services File Transfer LHC File Proxy Server AMGA Service Catalogue Security Compute Element Storage Services Element SCAS CREAM LCG-CE Disk Pool Authz. Service Manager MON BLAH LCAS & dCache LCMAPS gLExec Worker Node Physical Resources EGEE-III INFSO-RI-222667 Data Day - Grid School 2009 9
  • 10.
    Enabling Grids forE-sciencE • Contact: – steven.newhouse@cern.ch EGEE-III INFSO-RI-222667 Data Day - Grid School 2009 10