1. Enabling Grids for E-sciencE
Distributed Data and gLite
Steven Newhouse
Technical Director
CERN
www.eu-egee.org
EGEE-III INFSO-RI-222667 EGEE and gLite are registered trademarks
2. The Data Deluge
Enabling Grids for E-sciencE
• Astronomy
• Genomics
• Earth Observation
• Digitisation
Crab Nebula
X-ray Optical
EGEE-III INFSO-RI-222667 Data Day - Grid School 2009 2
3. ... And the LHC
Enabling Grids for E-sciencE
X X
EGEE-III INFSO-RI-222667 Data Day - Grid School 2009 3
4. High throughput data analysis
Enabling Grids for E-sciencE
• Analysing the data
– Large ensemble calculations (100’s10,000’s jobs)
– Complex workflows – dependent on previous steps
• High Throughput
– Exploit distributed computing and storage resources
Data replicated (multiple locations)
Resources selected through a broker
• WMS: Workload Management System
• Higher level tools: GANGA, DIANE, ...
– Information system records the available resources
– File catalogue records the location of replicated files
• Data stored in files
– Growing interest in relational data access
– Stored on tape (long-term) or disk (immediate access)
EGEE-III INFSO-RI-222667 Data Day - Grid School 2009 4
5. Project Overview
Enabling Grids for E-sciencE
17000 users
136000 LCPUs (cores)
25Pb disk
39Pb tape
12 million jobs/month
+45% in a year
268 sites
+5% in a year
48 countries
+10% in a year
162 VOs
+29% in a year
EGEE-III INFSO-RI-222667 Data Day - Grid School 2009 5
6. So what does EGEE actually do?
Enabling Grids for E-sciencE
• Builds and supports user communities on the grid
Application User
Training
Porting Support
• Integrates and provides a worldwide infrastructure
Software Integration,
Test & Deployment Operations
Development Certification
• Collaboration and Technical Leadership worldwide
Collaborating
Standards Policy
Projects
EGEE-III INFSO-RI-222667 Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009 6
7. Supporting Science
Enabling Grids for E-sciencE
• Archeology
End-user activity Resource Utilisation
• Astronomy
•• 13,000 end-users in 112 VOs Computational
Astrophysics Chemistry
• • Protection
Civil +44% users in a year Life Sciences
• Comp. Chemistry Multidisciplinary
• Earth Sciences Astronomy &
• Finance Astrophysics
Earth Science
• Fusion
• Geophysics Fusion
• High Energy Physics Other Areas
• Life Sciences 0 1 2 3 4 5 6 7
• Multimedia March 2008 to February 2009 (%) March 2007 to February 2008 (%)
• Material Sciences Proportion of HEP usage ~77%
EGEE-III INFSO-RI-222667 Data Day - Grid School 2009 7
8. Connecting Users to Resources
Enabling Grids for E-sciencE
Applications
Middleware
Physical Resources
Computers Disks Tape
EGEE-III INFSO-RI-222667 Data Day - Grid School 2009 8
9. gLite Middleware
Enabling Grids for E-sciencE
EGEE Maintained Components Access
User External Components
User Interface
User Interface
General Services Virtual
Workload Logging & Organisation
BDII Management Book keeping Hydra Membership
Service Service Service
Information Services
File Transfer LHC File Proxy Server
AMGA
Service Catalogue
Security
Compute Element Storage Services
Element
SCAS
CREAM LCG-CE Disk Pool
Authz. Service
Manager
MON BLAH LCAS &
dCache LCMAPS
gLExec Worker Node
Physical Resources
EGEE-III INFSO-RI-222667 Data Day - Grid School 2009 9
10. Enabling Grids for E-sciencE
• Contact:
– steven.newhouse@cern.ch
EGEE-III INFSO-RI-222667 Data Day - Grid School 2009 10