Your SlideShare is downloading. ×
Session 33 - Production Grids
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Session 33 - Production Grids

527
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
527
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Yellow – gLite, Green – externally supported components, gLite consortium
  • Transcript

    • 1. Overview of Production Grids
      Steven Newhouse
    • 2. Contents
      Open Science Grid
      DEISA
      NAREGI
      Nordic DataGrid Facility
      EGEE
      TeraGrid
      EGI
    • 3. Open Science Grid
      Ruth Pordes
    • 4. Open Science Grid
      Consortium - >100 member organizations contributing resources, software, applications, services.
      Project:
      Funded by DOE and NSF to deliver to the OSG Consortium for 5 years 2006-2011, 33FTEs.
      VO science deliverables are OSG’s milestones.
      Collaboratively focused: Partnerships, international connections, multidisciplinary
      Satellites - independently funded projects contributing to the OSG Consortium program and vision:
      CI-Team User and Campus Engagement,
      VOSS study of Virtual Organizations,
      CILogin Integration of end point Shibboleth identity management into the OSG infrastructure,
      Funding for students to the International Summer School for Grid Computing 2009
    • 5. TeraGrid'09 (Jun. 23, 2009)
      Paul Avery
      5
      OSG & Internet2 Work Closely w/ Universities
      ~100 compute resources
      ~20 storage resources
      ~70 modules in software stack
      ~35 User Communities (VOs)
      600,00-900,000 CPUhours/day,
      200K-300K jobs/day,
      >2000 users
      ~5 other infrastructures
      ~25 resource sites.
      2
      5
    • 6. Users
      Nearly all applications High Throughput. Small number of users starting MPI production use.
      Major accounts: US LHC, LIGO
      ATLAS, CMS >3000 physicists each.
      US ATLAS & US CMS Tier-1, 17 Tier-2s and new focus on Tier-3s (~35 today, expect ~70 in a year)
      ALICE taskforce to show usability of OSG infrastructure for their applications.
      LIGO Einstein@Home
      US Physics Community:
      Tevatron - CDF & D0 FNAL and remote sites.
      Other Fermilab users – Neutrino, astro, simulation, theory
      STAR
      IceCube
      Non-Physics:
      ~6% of usage. ~25 single PIs or small groups from biology, molecular dynamics, chemitry, weather forecasting, mathematics, protein prediction.
      campus infrastructures: ~7 including universities and labs.
    • 7. Non-physics use highly cyclic
    • 8. Operations
      All hardware contributed by members of the Consortium
      Distributed operations infrastructure including security, monitoring, registration, accounting services etc.
      Central ticketing system, 24x7 problem reporting and triaging at the Grid Operations Center.
      Distributed set of Support Centers as first line of support for VOs, services (e.g. software) and Sites.
      Security incident response teams include Site Security Administrators and VO Security Contacts.
      Software distribution, patches (security) and update.
      Targetted Production, Site and VO support teams.
    • 9. OSG Job Counts (2008-9)
      TeraGrid'09 (Jun. 23, 2009)
      Paul Avery
      9
      100M Jobs
      300K jobs/day
    • 10. Software
      OSG Virtual Data Toolkit packaged, tested, distributed, supported software stack used by multiple projects – OSG, EGEE, NYSGrid, TG, APAC, NGS.
      ~70 components covering Condor, Globus, security infrastructure, data movement, storage implementations, job management and scheduling, network monitoring tools, validation and testing, monitoring/accounting/information, needed utilities such as Apache, Tomcat;
      Server, User Client, Worker-Node/Application Client releases.
      Build and regression tested using U of Wisconsin Madison Metronome system.
      Pre-release testing on 3 “VTB” sites – UofC, LBNL, Caltech
      Post-release testing of major releases on Integration Testbed
      Distributed team at Uof Wisconsin, Fermilab, LBNL.
      Improved support for incremental upgrades in OSG 1.2 release summer ’09.
      OSG configuration and validation scripts distributed to use the VDT.
      OSG does not develop software except for tools and contributions (extensions) to external software projects delivering to OSG stakeholder requirements.
      Identified liaisons provide bi-directional support and communication between OSG and External Software Provider projects.
      OSG Software Tools Group oversees all software developed within the project.
      Software vulnerability and auditing processes in place.
    • 11. VDT Progress (1.10.1 Just Released)
      TeraGrid'09 (Jun. 23, 2009)
      Paul Avery
      11
      ~ 70 components
    • 12. Partnerships and Collaborations
      Partnerships with network fabric and identity service providers – ESNET, Internet2
      Continuing bridging work with EGEE, SuraGrid, TeraGrid.
      ~17 points of contact/collaboration with EGEE and WLCG.
      Partnership statement for EGI/NGIs.
      Emerging collaborations with TG on Workforce Training, Software, Security.
      Creator(co-sponsor) of successful e-weekly (International) Science Grid This Week.
      Co-sponsor of this iSSGC’09 school.
      Member of Production Infrastructure Policy Group (OGF affiliated).
    • 13. Community Collaboratories
      13
      Community Collaboratory
    • 14. DEISA Advancing Science in Europe
      H. Lederer, A. Streit, J. Reetz - DEISA
      RI-222919
      www.deisa.eu
    • 15. DEISA consortium and partners
      Eleven Supercomputing Centres in EuropeBSC, CSC, CINECA, ECMWF, EPCC, FZJ, HLRS, IDRIS, LRZ, RZG, SARA
      Four associated partners: CEA, CSCS, JSCC, KTH
      July 2009
      H. Lederer, A. Streit, J. Reetz - DEISA
      15
      RI-222919
      Co-Funded
      by the
      European Commission
      DEISA2
      contract
      RI-222919
      15
    • 16. Infrastructure and Services
      HPC infrastructure with heterogeneous resources
      State-of-the-art supercomputer
      Cray XT4/5, Linux
      IBM Power5, Power6, AIX / Linux
      IBM BlueGene/P, Linux
      IBM PowerPC, Linux
      SGI ALTIX 4700 (Itanium2 Montecito), Linux
      NEC SX8/9 vector systems, Super UX
      More than 1 PetaFlop/s of aggregated peak performance
      Dedicated network, 10 Gb/s links provided by GEANT2 and NRENs
      Continental shared high-performance filesystem (GPFS-MC, IBM)
      HPC systems are owned and operated by national HPC centres
      DEISA services are layered and operated on top
      Fixed fractions of the HPC resources are dedicated for DEISA
      Europe-wide coordinated expert teams for operation, technology developments, and application enabling and support
      July 2009
      H. Lederer, A. Streit, J. Reetz - DEISA
      16
      RI-222919
      16
    • 17. July 2009
      H. Lederer, A. Streit, J. Reetz - DEISA
      17
      HPC resource usage
      RI-222919
      HPC Applications
      from various scientific fields: astrophysics, earth sciences, engineering, life sciences, materials sciences, particle physics, plasma physics
      require capability computing facilities (low latency, high throughput interconnect), often application enabling and support
      Resources granted through:
      - DEISA Extreme Computing Initiative (DECI, annual calls)
      DECI call 2008
      42 proposals accepted 50 mio CPU-h granted*
      DECI call 2009 (proposals currently under review)
      75 proposals more than 200 mio CPU-h requested* *) normalized to IBM P4+
      Over 160 universities and research institutes from 15 European countries with co-investigators from four other continents have already benefitted
      - Virtual Science Community Support
      2008: EFDA, EUFORIA, VIROLAB
      2009: EFDA, EUFORIA, ENES, LFI-PLANCK, VPH/VIROLAB, VIRGO
      17
    • 18. Middleware
      Various services are provided on the middleware layer:
      DEISA Common Production Environment (DCPE)
      (Homogeneous software environment layer for heterogeneous HPC platforms)
      High performance data stage-in/-out to GPFS: GridFTP
      Workflow management: UNICORE
      Job submission:
      UNICORE
      WS-GRAM (optional)
      Interactive usage of local batch systems
      remote job submission between IBM P6/AIX systems (LL-MC)
      Monitoring System: INCA
      Unified AAA: distributed LDAP and resource usage data bases
      Only few software components are developed within DEISA
      Focus on technology evaluation, deployment and operation
      Bugs are reported to the software maintainers
      July 2009
      H. Lederer, A. Streit, J. Reetz - DEISA
      18
      RI-222919
      18
    • 19. Standards
      DEISA has a vital interest in the standardization of interfaces to HPC services
      Job submission, job and workflow management, data management, data access and archiving, networking and security (including AAA)
      DEISA supports OGF standardization groups
      JSDL-WG and OGSA-BES for job submission,
      UR-WG and RUS-WG for accounting
      DAIS for data services
      Engagement in Production Grid Infrastructure WG
      DEISA collaboration in standardization with other projects
      GIN community
      Infrastructure Policy Group (DEISA, EGEE, TeraGrid, OSG, NAREGI)
      Goal: Achievement of seamless interoperation of leading Grid Infrastructures worldwide
      - Authentication, Authorization, Accounting (AAA)
      - Resource allocation policies
      - Portal / access policies
      July 2009
      H. Lederer, A. Streit, J. Reetz - DEISA
      19
      RI-222919
      19
    • 20. Status of CSI Grid (NAREGI)
      Kento Aida
      National Institute of Informatics
    • 21. Overview
      Current Status
      We started pilot operation in May 2009.
      Organization
      Computer centers in 9 universities
      resource provider
      National Institute of Informatics
      network provider (SINET 3) and GOC
      Funding
      organizations’ own funding
      Kento Aida, National Institute of Informatics
      21
    • 22. Operational Infrastructure
      Kento Aida, National Institute of Informatics
      22
    • 23. Middleware
      NAREGI middleware Ver. 1.1.3
      developer
      National Institute of Informatics
      ( http://middleware.naregi.org/Download/ )
      platform
      CentOS 5.2 + PBS Pro 9.1/9.2
      OpenSUSE 10.3 + Sun Grid Engine v6.0
      Kento Aida, National Institute of Informatics
      23
    • 24. Nordic DataGrid Facility
      Michael Gronager
    • 25. NDGF Organization
      • A Co-operative Nordic Data and Computing Grid facility
      • 26. Nordic production grid, leveraging national grid resources
      • 27. Common policy framework for Nordic production grid
      • 28. Joint Nordic planning and coordination
      • 29. Operate Nordic storage facility for major projects
      • 30. Co-ordinate & host major eScience projects (i.e., Nordic WLGC Tier-1)‏
      • 31. Contribute to grid middleware and develop services
      • 32. NDGF 2006-2010
      • 33. Funded (2 M€/year) by National Research Councils of the Nordic Countries
      25
      NOS-N
      IS
      DK
      SE
      FI
      NO
      Nordic Data Grid Facility
    • 34. NDGF Facility - 2009Q1
    • 35. NDGF People - 2009Q2
    • 36. Application Communities
      • WLCG – the Worldwide Large Hadron Collider Grid
      • 37. Bio-informatics sciences
      • 38. Screening of CO2-Sequestration suitable reservoirs
      • 39. Computational Chemistry
      • 40. Material Science
      • 41. And the more horizontal:
      • 42. Common Nordic User Administration,
      • 43. Authentication,
      • 44. Authorization &
      • 45. Accounting
      28
    • 46. Operations
      • Operation team of 5-7 people
      • 47. Collaboration btw. NDGF and SNIC and NUNOC
      • 48. Expert 365 days a year
      • 49. 24x7 by Regional REN
      • 50. Distributed over the Nordics
      • 51. Runs:
      • 52. rCOD + ROC – for Nordic + Baltic
      • 53. Distributed Sites (T1, T2s)
      • 54. Sysadmins well known by the operation team
      • 55. Continuous chatroom meetings
      29
    • 56. Middleware
      • Philosophy:
      • 57. We need tools to run an e-Infrastructure.
      • 58. Tools cost: money / in kind.
      • 59. In kind means Open Source tools
      • 60. – hence we contribute to things we use:
      • 61. dCache (storage) – a DESY, FNAL, NDGF ++ collaboration
      • 62. ARC (computing) – a Collaboration btw Nordic, Slovenian, Swiss insts.
      • 63. SGAS (accounting) and Confusa (client-cert from IdPs)
      • 64. BDII, WMS, SAM, AliEn, Panda – gLite/CERN tools
      • 65. MonAmi, Nagios (Monitoring)
      30
    • 66. NDGF now and in the future
      • e-Infrastructure as a whole is important
      • 67. Resources count Capacity and Capability Computing and different Network and Storage systems
      • 68. The infrastructure must support different access methods (grid, ssh, application portals etc) – note that the average grid use of shared resources are only around 10-25%
      • 69. Uniform User Mgmt, Id, Access, Accounting, Policy Enforcement and resource allocation and sharing
      • 70. Independent of access method
      • 71. For all users
    • Enabling Grids for E-Science
      Steven Newhouse
    • 72. Enabling Grids for E-Science
      Project Status - Bob Jones - EGEE-III First Review 24-25 June 2009
      33
      New scientific community
      Established user community
      Networking activities
      Middleware activities
      Networking activities
      Service activities
      Duration: 2 years
      Total Budget:
      Staff ~47 M€
      H/W ~50 M€
      EC Contribution: 32 M€
      Total Effort:
      9132 person months
      ~382 FTE
    • 73. Project Overview
      17000 users
      136000 LCPUs (cores)
      25Pb disk
      39Pb tape
      12 million jobs/month
      +45% in a year
      268 sites
      +5% in a year
      48 countries
      +10% in a year
      162 VOs
      +29% in a year
      Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009
      34
    • 74. Supporting Science
      Archeology
      Astronomy
      Astrophysics
      Civil Protection
      Comp. Chemistry
      Earth Sciences
      Finance
      Fusion
      Geophysics
      High Energy Physics
      Life Sciences
      Multimedia
      Material Sciences
      Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009
      35
      Resource Utilisation
      End-user activity
      • 13,000 end-users in 112 VOs
      • 75. +44% users in a year
      • 76. 23 core VOs
      • 77. A core VO has >10% of usage within its science cluster
      Proportion of HEP usage ~77%
    • 78. Operations
      Monitored 24x7 on a regional basis
      Central help desk for all issues
      Filtered to regional and specialist support unites
    • 79. gLite Middleware
      Technical Status - Steven Newhouse - EGEE-III First Review 24-25 June 2009
      37
      User Interface
      User Access
      External Components
      User Interface
      EGEE Maintained Components
      Information Services
      General Services
      Security
      Services
      Virtual Organisation Membership
      Service
      Workload
      Management Service
      Logging &
      Book keeping
      Service
      Hydra
      JSDL & BES
      BDII
      GLUE 2.0
      X.509 Attributes
      Proxy Server
      AMGA
      File Transfer
      Service
      LHC File
      Catalogue
      DMI
      Storage Element
      Compute Element
      SCAS
      CREAM
      LCG-CE
      JSDL & BES
      Disk Pool Manager
      SAML
      SRM
      Authz. Service
      BLAH
      MON
      LCAS & LCMAPS
      dCache
      Worker Node
      gLExec
      Physical Resources
    • 80. TeraGrid
      Daniel S. Katz
    • 81. TeraGrid Overview
      TeraGrid is run by 11 resource providers (RPs) and integrated by Grid Infrastructure Group (GIG, at University of Chicago)
      TeraGrid Forum (made of these 12 entities) decides policy by consensus (elected chair is John Towns, NCSA)
      Funding is by separate awards from the National Science Foundation to the 12 groups
      GIG sub-awards integration funding to the 11 RPs and some additional groups
      Resources (distributed at the 11 RPs across the United States, connected by 10 Gbps paths)
      14 HPC systems (1.6 PFlops, 310 TBytes memory)
      1 HTC pool (105000 CPUs)
      7 storage systems (3.0 PBytes on-line, 60 PBytes off-line)
      2 viz systems (128 tightly integrated CPUs,14000 loosely coupled CPUs)
      Special purpose systems (GPUs, FPGAs)
    • 82. Applications Community
      (2008)
      (2006)
      Primarily HPC usage, but growing use of science gateways and workflows, lesser HTC usage
    • 83. Operations Infrastructure
      Lots of services to keep this all together
      Keep most things looking like one to the users, including:
      Allocations, helpdesk, accounting, web site, portal, security, data movement, information services, resource catalog, science gateways, etc.
      Working on, but don’t have in production yet:
      Single global file system, identity management integrated with universities
      Services supported by GIG, resources supported by RPs
    • 84. Middleware
      Coordinated TeraGrid Software Stack is made of kits
      All but one (Core Integration) are optional for RPs
      Kits define a set of functionality and provide an implementation
      Optional Kits: Data Movement, Remote Login Capability, Science Workflow Support, Parallel Application Capability, Remote Compute, Application Development and Runtime Support Capability, Metascheduling Capability, Data Movement Servers Capability, Data Management Capability, Data Visualization Support, Data Movement Clients Capability, Local Resource Provider HPC Software, Wide Area GPFS File Systems, Co-Scheduling Capability, Advance Reservation Capability, Wide Area Lustre File Systems, Science Gateway Kit
      Current status:(kits only top, resources along left side, yellow means kit is installed, white meanskit is not installed)
      Some kits are now being rolled-out (ScienceGateway) and will become more widelyused, some have limited functionality(Data Visualization Support) that onlymakes sense on some resources
    • 85. TeraGrid
      TeraGrid considers itself the world’s largest open scientific computing infrastructure
      Usage is free, allocations are peer-reviewed and available to all US researchers and their collaborators
      TeraGrid is a platform on which others can build
      Application developers
      Science Gateways
      TeraGrid is a research project
      Learning how to do distributed, collaborative science on a continental-scale, federated infrastructure
      Learning how to run multi-institution shared infrastructure
    • 86. Common Characteristics
      Operating a production grid requires WORK
      Monitoring, reporting, chasing, ...
      No ‘off the shelf’ software solution
      Plenty of components... But need verified assembly!
      No central control
      Distributed expertise leads to distributed teams
      Resources are federated  Ownership lies elsewhere
      No ownership by the Grid of hardware resources
      All driven by delivering to user communities
    • 87. The Future in Europe: EGI
      EGI: European Grid Initiative
      Result of the EGI Design Study
      2 year project to build community consensus
      Move from project to sustainable funding
      Leverage other sources of funding
      Build on national grid initiatives (NGIs)
      Provide the European ‘glue’ around independent NGIs
    • 88. The EGI Actors
      46
      Research Teams
      Research Institutes
      EGI
      NGI2
      NGI1
      EGI.eu
      NGIn

      National Grid Initiatives (NGIs)
      Resource Centres
    • 89. EGI.eu and NGI Tasks
      47
      NGI
      NGI
      NGI
      NGI
      EGI.eu
      EGI.eu tasks
      NGI international tasks
      NGI nationaltasks
    • 90. Differences between EGEE & EGI
      48
      EGI.eu
      NGI Operations
      Specialised
      Support Centres
      European Middleware
      Initiative (EMI)
    • 91. Middleware
      EGI will release UMD
      Unified Middleware Distribution
      Components needed to build a production grid
      Initial main providers:
      ARC, gLite & UNICORE
      Expect to evolve components over time
      Have defined interfaces to enable multiple providers
      EMI project from ARC, gLite & UNICORE
      Supports, maintains & harmonise software
      Introduction & development of standards
    • 92. Current Status
      • 8th July: EGI Council Meeting:
      • 93. Confirmation of Interim Director
      • 94. Establish Editorial team for the EC Proposals
      • 95. 30thJuly: EC Call open
      • 96. 1st October: Financial contributions to EGI Collaboration due
      • 97. October/November: EGI.eu established
      • 98. 24thNovember: EC Call closed
      • 99. December 2009/January 2010: EGI.eu startup phase
      • 100. Winter 2010: Negotiation phase for EGI projects
      • 101. 1st May 2010: EGI projects launched
      Plans for Year II - Steven Newhouse - EGEE-III First Review 24-25 June 2009
      50
    • 102. European Future?
      Sustainability
      E-Infrastructure is vital
      Will underpin many research activities
      Activity has to be driven by active stakeholders