Jorge gomes


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Jorge gomes

  1. 1. Laboratório de Instrumentação e Física Experimental de Partículas GRID, PaaS for e-science ? (thoughts about grids and clouds) Jorge Gomes CloudViews, Porto, May 2010 1
  2. 2. A view from 1960 It seems reasonable to envision, for a time 10 or 15 years hence, a “thinking center” that will incorporate the functions of present-day libraries together with anticipated advances in information storage and retrieval and the symbiotic functions suggested earlier in this paper. The picture readily enlarges itself into a network of such centers, connected to one another by wide-band communication lines and to individual users by leased-wire services. In such a system, the speed of the computers would be balanced, and the cost of the gigantic memories and the sophisticated programs would be divided by the number of users. Man-Computer Symbiosis, J.C.R. Licklider, 1960 CloudViews, Porto, May 2010 2
  3. 3. About LIP • LIP is Portuguese scientific research laboratory: – High Energy Physics (HEP) – Associated laboratory funded by the Portuguese public funding agencies – Private non-profit association – Created in 1986 when Portugal joined CERN • LIP participation in physics experiments includes: – Atlas, CMS, Compass, Auger, AMS, SNO, Zeplin, Hades, … CERN - LHC • Other activities include: – Building DAQ systems and particle detectors, detectors R&D, medical physics, Geant4 – Electronics, precision mechanics, grid computing CloudViews, Porto, May 2010 3
  4. 4. About the LHC • The Large Hadron Collider (LHC) is the largest scientific instrument on earth: – Located at CERN in the Swiss/French border – 27 Km of circumference – At 100 meters depth (average) – 600 million particle collisions per second – Reproducing the energy density that existed just a few moments after the big bang Beams of protons will collide at an energy of 14 TeV • Objective: Beams of lead nuclei will collide at an energy of 1150 TeV – Probe deeper into the matter structure than ever before – Understand fundamental questions about the universe • Four experiments working in parallel: – ATLAS, CMS, ALICE, LHCB CloudViews, Porto, May 2010 6
  5. 5. The LHC computing challenge • Data volume – High rate * large number of channels * 4 experiments  15 PetaBytes of new data each year • Compute power – Event complexity * Nb. events * thousands users  100 k of (today's) fastest CPUs • Worldwide analysis & funding – Computing funding locally in major regions & countries – Efficient analysis everywhere – Hundreds of locations  GRID Computing CloudViews, Porto, May 2010 7
  6. 6. Grid Infrastructure Projects Timeline DataGrid CrossGrid LCG EGEE - I EELA EGEE -II - Int.Eu.Grid EGEE- III EGI Inspire 2001 LIP and 2002 Grid Computing projects 2003 2004 2005 2006 2007 2008 2009 2010 CloudViews, Porto, May 2010 8
  7. 7. Today WLCG is a success • Running increasingly high workloads: – Jobs in excess of 650k / day; Anticipate millions / day soon – CPU equiv. ~100k cores • Workloads are: – Real data processing – Simulations – Analysis – more and more (new) users • Data transfers at unprecedented rates CloudViews, Porto, May 2010 9
  8. 8. 267 sites 55 countries 150,000 CPUs 28 PB online 41 PB offline 16,000 users 200 VOs 660,000 jobs/day Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fusion Geophysics High Energy Physics Life Sciences Was the largest multidisciplinary grid Multimedia Now being replaced by EGI Inspire Material Sciences … 10 CloudViews, Porto, May 2010
  9. 9. Middleware • Security – Virtual Organization Management (VOMS) – MyProxy • Data management – File catalogue (LFC) – File transfer service (FTS) – Storage Element (SE) – Storage Resource Management (SRM) • Job management – Work Load Management System(WMS) – Logging and Bookeeping (LB) – Computing Element (CREAM CE, LCG CE) – Worker Nodes (WN) • Information System – Monitoring: BDII (Berkeley Database Information Index), RGMA (Relational Grid Monitoring Architecture)  aggregate service information from multiple Grid sites, now moved to SAM (Site Availability Monitoring) – Monitoring & visualization (Gridview, Dashboard, Gridmap etc.) CloudViews, Porto, May 2010 11
  10. 10. European Grid Initiative (EGI) • European Grid Initiative replacing EGEE – Scientific grid computing sustainability in Europe: • Grid computing in Europe is now critical for many scientific communities • Grid computing in Europe must not depend on isolated short term projects – New organizational model with two layers: • National Grid Initiatives (NGIs) funded and managed by the governments • European Grid Initiative funded by the NGIs and by the EU • EGI headquarters has been established in Amsterdam • The transition is happening now ! LIP workshop 2010 12
  11. 11. National Grid Initiatives • Most NGIs are not limited to grid computing: – Distributed computing  Current focus is grid computing – HPC – HTC – Applications – Network provisioning for distributed computing • There is a growing interest in cloud computing: – by the NGIs – by the research communities CloudViews, Porto, May 2010 13
  12. 12. European Grid Initiative • The 48 month EGI-InSPIRE (Integrated Sustainable Pan-European Infrastructure for Researchers in Europe) project will continue the transition to a sustainable pan-European e-Infrastructure started in EGEE-III. It will sustain support for Grids of high-performance and high-throughput computing resources, while seeking to integrate new Distributed Computing Infrastructures (DCIs), i.e. Clouds, SuperComputing, Desktop Grids, etc. • Future technologies will include the integration of cloud resources (from either commercial or academic providers) into the production infrastructure offered to the European research community. • Exploratory work to see how cloud computing technologies could be used to provision gLite services is already taking place within EGEE- III between the EC FP7 funded RESERVOIR project and the StratusLab collaboration. Work using the Azure environment from Microsoft will be explored through the VenusC project. CloudViews, Porto, May 2010 14
  13. 13. European Grid Initiative • D2.6) Integration of Clouds and Virtualisation into the European production infrastructure: Provide a roadmap as to how clouds and virtualisation technology could be integrated into the EGI exploring not only the technology issues, but also the total costs of ownership of delivering such resources. [month 8] CloudViews, Porto, May 2010 15
  14. 14. Portuguese NGI Main Site • Electrical power: – 2000 kVA – 6x 200kVA UPSs – Emergency power generation • Chilled water cooling: – Chillers with free-cooling – Close-control units • Other characteristics: – Computer room area 370m2 – Fire detection and extinction – Access control – Remote monitoring and alarms • Computing resources: – Cluster HTC and HPC – Online storage – Offline storage – Services – Housing CloudViews, Porto, May 2010 18
  15. 15. Lessons – user communities • Grid has been very successful for some user communities • High Energy Physics matches perfectly grid computing: – Large user community – Excellent technical skills – Very structured and well organized – Users share common goals – Users share common data – Willing to share and collaborate – Distributed users and resources (geographically and administratively) – Huge amounts of data to process • They have a motivation and a reward for sharing resources ! CloudViews, Porto, May 2010 19
  16. 16. Lessons – user communities • This is not valid for many other user communities: – Small number of users (sometimes one single user) – Not structured sometimes even in direct competition – Not much distributed communities – Isolated peaks of activity instead of sustained usage – No tradition to cooperate (also sociological) • They have low motivation to share resources • Sometimes it is possible to create common VOs for them: – Good example is the EGEE biomed VO that includes many independent researchers under the global coordination of EGEE CloudViews, Porto, May 2010 20
  17. 17. Lessons – business model • The model for the scientific grids Core Services Dissemination Middleware VO support Operations Helpdesk is based on virtual organizations Training (VO) - user communities: – Users organize themselves and create VOs – Users integrate their own resources – Users share their resources with the EGEE / EGI - Grid other VO members Infrastructure bus – They might share resources with VO VO other VOs VO Resources Resources Resources Resources Resources Resources Resources VO VO – There is no pay-per-use model – There is no economic model VO VO CloudViews, Porto, May 2010 21
  18. 18. Lessons – business model • Reduced motivation for the resource providers: – No reward for providers not related with VOs – Most frequently providers only share if they have a local user community that needs grid computing and pushes for it – Providers tend to commit the minimum possible resources – Small capacity to provide elasticity for the VOs CloudViews, Porto, May 2010 22
  19. 19. Lessons – grid achievements • Technical – Standards effort is very important (Open Grid Forum) – Interoperability (not perfect or complete but very valuable) – Sophisticated data and resource management – Worldwide authentication for scientific grids – Common usage and security policies – Many developments for privacy and security – Powerful European infrastructure based on Géant network • European policy and coordination structure – Creation of national grid initiatives supported by the governments – Creation of the European Grid Initiative – Model for long term sustainability CloudViews, Porto, May 2010 23
  20. 20. Lessons – grid complaints • Mostly oriented for batch processing • Complex architecture • Steep learning curve • Hard do deploy, maintain and troubleshoot • May require considerable human resources to operate and use • Creation of new VOs is a heavy task • Several middleware stacks without full interoperability (gLite, ARC, UNICORE, globus, ...) • Applications may require some degree of porting • Not much user friendly • Reduced range of supported operating systems • Too heavy for small sites • Too heavy for users without very large processing requirements CloudViews, Porto, May 2010 24
  21. 21. Grids & Clouds • Elastic grid infrastructures: – Complement native grids with grids on top of clouds – Plan the physical infrastructure for sustained loads and use cloud services to accommodate usage peaks – Better elasticity for the VOs – Less costs and higher capacity for peak usage – Native grids for HPC and HTC, clouds for HTC – Valid both for grid sites and grid infrastructures CloudViews, Porto, May 2010 25
  22. 22. Grids & Clouds • Fully clouded grid infrastructures: – Full grid infrastructure on top of clouds – Fully dynamic  only (pay/allocate) what is needed – Grid managers have less worries about the underlying infrastructure and can concentrate on the service – Easier deployment if releases are cloud oriented – Unfortunately this is still problematic for HPC – When using commercial providers: • Possibly more resilient infrastructures • Might be not so interesting for high sustained loads • Requires careful estimation of costs and economies – But valid for a scientific/academic cloud service ... CloudViews, Porto, May 2010 26
  23. 23. Grids & Clouds • In science there are additional issues: – Is there money to pay services ? – Is there money to pay hardware ? – Is there money to pay human resources ? – Is there money for maintenance ? – Is there money for the electricity ? – We only have big money once in a while ... – Budgets and projects last one year so we don’t know if there will be money next year ... • Requires: – Careful estimation of costs – Detailed and accurate planning – Sustained funding CloudViews, Porto, May 2010 27
  24. 24. Grids & Clouds • In science there are additional concerns: – Black box we don’t know the architecture and scalability behind the commercial clouds – Lack of standard interfaces (provider lock-in) – Performance for very data intensive applications – Low latency for parallel applications – Privacy, security and availability concerns – Legal concerns – Network bandwidth to the commercial Internet – Future/Evolution of the clouds and their costs CloudViews, Porto, May 2010 28
  25. 25. Grids & Clouds • Support for special cases: – As NGI we get requests from all types of users will all sort of requirements ... – A more generic approach to support all sorts of computing requirements is welcome – Users with small/medium computing requirements that don’t want to mess with grid computing – Users that need very specific or non-supported distributed computing middleware – Users that need very specific software environments – Users that want to do things other than computing CloudViews, Porto, May 2010 29
  26. 26. Grids & Clouds • Attract more resource providers: – Bring in resource providers not rewarded by grid – Cloud computing is more generic than grid – It may bring in more academic / scientific resource providers – Being generic is more advantageous, everybody can use it for something – Then these new resources could be also usable to provide grid over the cloud – An economic model allowing to get credits for the CPU provided CloudViews, Porto, May 2010 30
  27. 27. Grids & Clouds • Increase flexibility – Use same resources to support a wide range of users • Grid users with many different requirements and needs • Generic scientific computing users (non-grid) • Other types of needs – Optimize the infrastructures use • Use free resources on grid and non-grid computing clusters – Preservation of data and processes • Capability to resurrect older grid and non-grid computing environments to run legacy applications – More power to the end users • Let the users choose and take care of their needs • Let operations people concentrate on running the infrastructure CloudViews, Porto, May 2010 31
  28. 28. Summary • Most building blocks do exist for “clouded grids” • There is motivation to do it • Several scenarios already demonstrated • At LIP we run OpenNebula in our grid infrastructure • Clouds are very suitable for HTC grid applications • Some of the concerns related with clouds are not much different from the ones mentioned for grids • For commercial grids further analysis are needed • Evolution points in the direction of providing virtual grid infrastructures on clouds CloudViews, Porto, May 2010 32
  29. 29. 33
  30. 30. EGEE in Portugal • Portugal and Spain compose the EGEE Southwest federation • LIP coordinates EGEE activities in the country since EGEE-I – Infrastructure operations coordination – User and site support – Infrastructure services – Training – Security and authentication • The Portuguese sites are: – LIP (Lisbon) – LIP (Coimbra) – UP (Porto) – DI-Uminho (Braga) – Uminho-CP (Braga) – IEETA (Aveiro) – CFP-IST (Lisbon) – Univ Lusíada (Famalicão) CloudViews, Porto, May 2010 34
  31. 31. LIP • Besides the Tier-2 LIP also provides computing resources for other research activities namely within grid projects – AUGER, SNO, COMPASS, CMS, ATLAS, medical physics, ESA, AMS, etc – LCG, EGEE,, EELA, NGI, IBERGRID, etc • LCG has been the driven force behind grid computing in the country • The Portuguese federated Tier-2 is composed by 3 sites: • LIP-Lisbon • LIP-Coimbra • NGI main node for grid computing CloudViews, Porto, May 2010 35
  32. 32. LIP Tier-2 topology (Lisbon) Computing SRM + Monitoring Site gLite Element GSIFTP Box BDII doors SGE Farm 37x HP DL160G5 14x HP DL160G6 Force10 Other systems core • SUN X2200 switch • SUN X4100 • DELL PE1950 SRM storage LUSTRE + STORM About 600 COREs DELL Storage servers ~ 40TB each ... CloudViews, Porto, May 2010 36
  33. 33. Virtualization • Why – Encapsulation – Provide multiple environments for multiple applications and services – More dynamic resource allocation profiting from existing resources • Where and how – Computing farm (mixture of real and virtual resources) – To host grid (or other) persistent services – For testing purposes – Xen paravirtualization mostly • Things being explored – More dynamic approach to virtual resources for classic farm computing (enable multiple environments tailored to the VOs) – A flexible framework for persistent virtual machines enabling resilient service provisioning – Cloud computing services on top of the existing resources (virtual machines on demand) – KVM (Kernel-based Virtual Machine) native virtualization Intel-VT and AMD-V, RH bought Qumranet the developer of KVM and seems to be betting on it CloudViews, Porto, May 2010 37
  34. 34. Iniciativa Nacional Grid (INGRID) • Initiative from the Portuguese Ministry of Science – Launched in April of 2006 in the context of • “Ligar Portugal” is a larger initiative for the information society – Managed by the government agencies FCT and UMIC • Bodies from de Ministry of Science: – FCT is the Portuguese Science Foundation – UMIC is the Portuguese Knowledge Society Agency – Technical coordination by LIP and UMIC • Main objectives: – Reinforce the national competence and capacity in the grid computing domain – Enable the use of grid computing for complex problem solving – Integrate Portugal in international grid computing infrastructures – Reinforce the multidisciplinary collaboration among research communities – Promote conditions for commercial companies to find in the country know how in the grid computing domain CloudViews, Porto, May 2010 38
  35. 35. INGRID projects • G-Cast: Application of GRID- • GERES-med: Grid-Enabled computing in a coastal REpositorieS for medical morphodynamics nowcast-forecast applications system • BING –Brain Imaging Network Grid • GridClass - Learning Classifiers • GRITO – A Grid for preservation Systems for Grid Data Mining • PM#GRID - GRID Platform • PoliGrid - distributed policies for Development for European Scale resource management in Grids Satellite Based Air Pollution • Collaborative Resources Online to Mapping Support Simulations on Forest • AspectGrid: Pluggable Grid Fires (CROSS-Fire): a Grid Aspects for Scientific Applications Platform to Integrate Geo- • P-found: GRID computing and referenced Web Services for Real- distributed data warehousing of Time Management protein folding and unfolding • GRID for ATLAS/LHC data simulation simulations and analysis CloudViews, Porto, May 2010 39
  36. 36. INGRID+ Create an autonomous NGI grid infrastructure Infrastructures and projects Core resources INGRID Main node etc EGI IBERGRID LCG Existing resources (EGEE,, EELA, ... INGRID projects...) Users: – INGRID projects Other resources – Virtual organizations (national and international) – Other users with demanding Users computing requirements CloudViews, Porto, May 2010 41
  37. 37. Setup of NGI Core Resources • Core resources initially composed of three grid clusters: – main node for grid computing • New facility locate at LNEC – grid resources provided by the LIP computer centre in Lisbon • Located at the LIP facilities in Lisbon – additional grid resources provided by the LIP computer centre in Coimbra • Located at the CFC datacentre in the University of Coimbra • Support for the integration of computing resources in the country: – Initially focus on existing resource centres – Expand to other sites at a later stage – Concentrate on gLite resources – Funding line to support the organizations providing resources CloudViews, Porto, May 2010 42
  38. 38. Main node - Location • The main node is being built by a consortium of research organizations under the Portuguese NGI context: – LIP, FCCN, LNEC • The project started in 2007. • Some components are already operational. • It will become officially operational in the coming weeks. • The centre is located at the LNEC campus very near to the near the FCCN NOC in Lisbon • Excellent network connectivity: – FCCN national backbone – Géant PoP CloudViews, Porto, May 2010 43
  39. 39. Main Node Facility – Details • Facilities to house computing equipment are very expensive – 900K Euro in equipment – more than 1200K Euro in the facility (low construction cost) • Operational costs are also very heavy – Electrical power for cooling and all the systems – Environment impact also relevant • Optimization very important – Minimize electrical power losses – Maximize effectiveness of cooling systems • Measures – Chillers + free cooling – Minimize mixture of hot and cold air – Careful set point selection for air conditioning – Highly efficient UPS systems and power supplies – Use blade centers for higher power efficiency – Power efficiency study (look at reusing heat or other forms of generating power) – Look at ways to turn off/on systems dynamically CloudViews, Porto, May 2010 44
  40. 40. Main node - Computing resources • Setup – Tape library LTO-4 • Grid accessible data repositories • Hierarchical storage – Core grid services • Two blade centers • 192 CPU cores – Grid cluster • HTC and HPC blades • ~ 1250 CPU cores for processing – Online grid storage • Server direct attached storage • ~ 620TB raw + 70TB raw SAN – Local network • Core 10gigabit Ethernet • Non-blocking, wire-speed, low latency – Resources from other organizations: • LNEC grid cluster CloudViews, Porto, May 2010 45
  41. 41. Computing Resources • High Throughput Computing Servers – IBM blades: • 2 quad-core AMD opteron 2356 processors • 2 quare-core INTEL Xeon E5420 processors – HP blades: • 2 quad-core INTEL Xeon X5550 processors – 3 GB of RAM per core (24GB per blade) – Running SL5 x86_64 • High Performance Computing Servers – IBM blades: • 2 quad-core AMD opteron 2356 processors • Infiniband • 4 GB of RAM per core (32GB per blade) • Running SL5 x86_64 CloudViews, Porto, May 2010
  42. 42. Storage Resources • Storage servers and expansion boxes – IBM X3650 servers running SL5 x86_64 • 2 quad-core Intel(R) Xeon(R) L5420 CPUs • 2 SAS disks deployed in Raid mirror • 10 Gigabit Ethernet adapters Each server has associated 40 TB of effective storage • Expansion boxes in Raid 5 Volumes with 1 TB SATA-II disks • Total of ~ 620 TB of online grid storage space – HP DL360 servers running SL5 x86_64 • 2 quad-core Intel(R) Xeon(R) L5420 CPUs • 2 SAS disks deployed in Raid mirror • 10 Gigabit Ethernet adapters Each server has associated 40 TB of effective storage • Expansion boxes with 450 GB SAS disks 47
  43. 43. Main node for grid computing - schema Computing Blades SGE cluster HPC HTC HTC HTC HTC ... Support services CORE blades Core 10gigabit Ethernet net switch CORE ... 1ª phase: Storage = Lustre + StoRM ~1250 CPU cores ~ 620 TB raw Jornadas RCTS 2010 48
  44. 44. Middleware • What do we want: – Interoperability with other organizations – Long term support – Reliability – Low cost • Choice: – Long term: may depend on decisions taken at European level in EGI (UMD) – Short term: use gLite – Medium term: consider other user needs • gLite: – Possibly the most used middleware in European and other grid infrastructures – Already being used by the Portuguese resource centres in EGEE, Int.Eu.Grid and EELA – gLite developers participate actively in the international standardization bodies • But ... – Difficult to deploy and maintain, some reliability issues, too much HEP centric – We will integrate additional components when needed – MPI support with Int.Eu.Grid middleware extensions – Cloud computing can be a good complementing technology CloudViews, Porto, May 2010 49
  45. 45. European Grid Initiative (EGI) • EGI-DS – European Grid Initiative planning – Portugal in policy board (UMIC) • EGI – Portugal is member through UMIC – First national fee payed • EGI InSPIRE – Integrated Sustainable Pan-European Infrastructure for Researchers in Europe – Under EU negotiations – Main project for infrastructure coordination and operation • EGI InSPIRE international tasks – International bid for global tasks – Portugal and Spain in the middleware rollout coordination LIP workshop 2010 51
  46. 46. The EGI-InSPIRE Project Integrated Sustainable Pan-European Infrastructure for Researchers in Europe • A 4 year project with €25M EC contribution – Project cost €69M – Total Effort ~€330M Funded – Staff ~ 170FTE Un-Funded Project Partners (48), 37 NGIs, 2 EIROs, 8 AP 52 EGI-InSPIRE -
  47. 47. The European Grid Initiative E EGI VOs Virtual Helpdesk USERS Research G Community Training VOs Events VOs Virtual User I Trainers N . G USERS Community Apps. Research Board DB Commmuity VOs VRC E Helpdesk I Other Virtual u Helpdesk USERS VOs Research Community NGI Helpdesk ESFRI Project
  48. 48. IBERGRID • Is a common Portuguese/Spanish Iberian infrastructure • IBERGRID will provide an umbrella for an Iberian regional grid – Integrating Portuguese and Spanish NGI resources – Fully interoperable with EGI • Focus is now in the IBERGRID development as a requirement for a successful common participation in EGI – Towards a sustainable model but without loosing synergies and advantages • Current status – Main grid core services have been deployed on both countries – The initial set of virtual organizations has been created – Several sites are already configured to support IBERGRID VOs – Testing of this pilot infrastructure is ongoing CloudViews, Porto, May 2010 57
  49. 49. IBERGRID and EGI Portuguese grid initiative Spanish grid initiative IBERGRID = grid computing, HPC, applications, networks, volunteer computing Jornadas RCTS 2010 58
  50. 50. Iberian transition plan • Portugal – IBERGRID common VOs management and coordination – Operations portal – Catalogues and services for the IBERGRID VOs – Certification Authority for Portugal (LIPCA) • Spain – Helpdesk (Request Tracker) – Monitoring and accounting – Infrastructure database (GOCDB/HGSM) – Certification Authority (PKIrisGrid) – Middleware security • Common – Core services and redundancy – Regional information system – Support groups – Operations coordination – Training infrastructure – Infrastructure security – Seed resources for new users Jornadas RCTS 2010 59