Your SlideShare is downloading. ×
The glideinWMS approach to the ownership of System Images in the Cloud World
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

The glideinWMS approach to the ownership of System Images in the Cloud World

189
views

Published on

Presentation at CLOSER 2012. …

Presentation at CLOSER 2012.

Scientific communities that are accustomed to use Grid resources are now considering the use of Cloud resources. However, moving from the Grid to the Cloud brings along the need for the creation and maintenance of the system image used to configure the provisioned resources, and this presents both opportunities and problems for the users. The impact is especially interesting in the context of glideinWMS due to its layered architecture. This presentation describes the various options available to the glideinWMS project team, their advantages and disadvantages, and explains why one of them is to be preferred.

Closer web page: http://closer.scitevents.org/

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
189
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. CLOSER 2012 The glideinWMS approach to the ownership of System Images in the Cloud World presented by Igor Sfiligoi1 co-authors A.Tiradani2, B.Holzman2 and D.C.Bradley3 1 UC San Diego, 2Fermilab, 3UW MadisonCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 1
  • 2. Our users ● Our primary target audience is the scientific community ● In particular, those communities that need massive amounts of CPU cycles ● Large number of users - O(1k) ● Organized in groups (known as VOs) Virtual Organizations ● Batch processing is a must ● Typical user task requires > 1 CPU year ● Users understand the need of splitting the problem in many (semi-)independent tasksCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 2
  • 3. Our environment ● World-wide distributed computing a must ● No single resource provider can provide enough CPU for all the users – Both due to logistical and political constraints ● Until now this meant Grid computing ● i.e. federation of independent batch sys. providers ● and essentially all research-funded ● Our VOs are now considering Cloud computing, too Cloud==IaaS ● Both commercial and research-fundedCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 3
  • 4. Grid vs Cloud ● Similarities ● Both are a way to provision resources ● Differences relevant to this talk ● OS and system ● C Only bare (virtualized) libraries provided by the resource provider LO hardware provided by the resource provider ID ● User just executes his ● U Must install OS before R payload running the actual D G – Must play well with payload provided OS – But more flexibilityCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 4
  • 5. Grid vs Cloud ● Similarities ● Both are a way to provision resources ● Differences relevant to this talk ● OS and system ● Only bare (virtualized) C libraries provided by the resource provider hardware provided by the resource provider LO ID ● User just executes hisgroups Must install OS before Some user ● find U R payload the cloud model easier running the actual D G – Must play well with payload provided OS – But more flexibilityCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 5
  • 6. Why can Cloud be easier? ● Counterintuitive ● Installing and maintaining a whole OS a big task! ● Yet, easier than trying to adapt existing scientific application ● Most code not actively maintained ● Often written making system assumptions ● Plus, most (virtualized) hardware uniform ● And HW API variations are usually much smaller than OS API variationsCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 6
  • 7. Why can Cloud be easier? ● Counterintuitive ● Installing and maintaining a whole OS a big task! ● Yet, easier than trying to adapt existing scientific application But someone still has to manage ● Most codebatch jobs maintained the not actively ● Often written making system assumptions ● Plus, most (virtualized) hardware uniform ● Enter And HW API variations are usually much smaller than OS API variations glideinWMSCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 7
  • 8. glideinWMS ● An overlay batch system on top of dynamic resources ● Based on the pilot principle ● Hides provisioning Provider A from final users ● Looks and feels like Glidein a batch system resource pool on top of dedicated Provider A resources to them Provider C Provider B http://tinyurl.com/glideinWMS Sfiligoi, I. et al., (2009). The pilot way to grid resources using glideinwms. In Computer Science and Information Engineering, 2009 WRI World Congress on, 2, pp.428-32. doi:10.1109/CSIE.2009.950CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 8
  • 9. glideinWMS architecture ● Three independent components ● Glidein Factory – interface to resources ● VO Frontend – resource provisioning logic ● The actual WMS – seen by the final users Workload Management System == Batch system VO Frontend Monitor Request WMS Glidein Factory Job Pilot Pilot Resource Job Providers Pilot Pilot PilotCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 9
  • 10. glideinWMS architecture ● Three independent components ● Glidein Factory – interface to resources N-to-M ● VO Frontend – resource provisioning logic relationship ● The actual WMS – seen by the final users Workload Management System == Batch system VO Frontend VO Frontend Monitor Request WMS WMS Glidein Glidein Factory Factory VO Frontend Job Pilot Glidein Factory VO Frontend WMS Pilot Resource Job Providers Pilot Pilot PilotCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 10
  • 11. Multiple operation teams ● Few groups operate the whole glideinWMS ● Typically separate groups for ● Glidein Factory ● VO Frontend + WMS Sfiligoi, I. et al., (2011). Reducing the human cost of grid computing with glideinWMS. In CLOUD COMPUTING 2011, pp. 217-21.CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 11
  • 12. Multiple operation teams ● Few groups operate the whole glideinWMS ● Typically separate groups for ● Glidein Factory ● VO Frontend + WMS ● Glidein Factory typically generic ● Essentially an abstraction layer to resources ● Was the interface to Grid resources – Adding the support for Cloud resources now Sfiligoi, I. et al., (2011). Reducing the human cost of grid computing with glideinWMS. In CLOUD COMPUTING 2011, pp. 217-21. Andrews, W. et al., (2011). Early experience on using glideinWMS in the cloud. J. Phys.: Conf. Ser. 331 062014CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 12
  • 13. Multiple operation teams ● Few groups operate the whole glideinWMS ● Typically separate groups for ● Glidein Factory ● VO Frontend + WMS ● Glidein Factory typically generic ● VO Frontend the actual brain ● Contains resource provisioning logic ● Provides pilot credential(s) ● Customizes the pilotSfiligoi, I. et al., (2011). Adapting to the Unknown With a few Simple Rules: The glideinWMS Experience. In ADAPTIVE 2011, pp. 25-28.Jan 2012 VO Frontend Training Session at UCSDCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 13
  • 14. Multiple operation teams ● Few groups operate the whole glideinWMS ● Typically separate groups for ● Glidein Factory ● VO Frontend + WMS ● Glidein Factory typically generic So... ● VO Frontend the actual brain when moving to the Cloud, who should own the OS image?CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 14
  • 15. Option 1 – The Grid approach ● The easiest path forward is to mimic the Grid approach ● Nobody, but the closest layer aware of the Cloud ● This implies ● OS image owned by the Glidein Factory – Since it is the one closest to the Cloud ● Drop superuser privileges as soon as the system boots up – Since superuser ops not allowed in the Grid paradigmCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 15
  • 16. Option 1 - AnalysisAdvantages: Disadvantages: ● VO unaware of the Cloud ● VO cannot take advantage of Cloud flexibility ● Only system services ● Pilot cannot take full have superuser privileges advantage of OS features – Minimize risk ● OS management shared ● More work for the between multiple VOs Factory admins – Better quality – May need to maintain – Lower security risk several OS imagesCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 16
  • 17. Option 2 – Adding root privileges ● One step further is allowing pilot to run as root (i.e. superuser) ● This implies ● OS image still owned by the Glidein Factory ● But pilot process now a system service ● VO can customize OS configs, e.g. – Install new software – Configure and start otherwise-dormant system services ● Side effect of VO Frontend being able to customize the pilot ● Since running as root, it can change anything in the system ● Acceptable, since pilot running with credential provided by the VO FECLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 17
  • 18. Option 2 - AnalysisAdvantages: Disadvantages: ● VO only marginally aware ● VO cannot provide the of the Cloud OS of their choice ● VO can change ● Increased security risk system settings – Dynamic system configuration – And the pilot now a proper – Superuser processes batch system daemon ● VO must maintain any software it installs Added effort ● ● OS management shared ● More work for the Increased ● security risk between multiple VOs Factory admins – Better quality – May need to maintain – Lower security risk several OS imagesCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 18
  • 19. Option 3 – Pure Cloud model ● Move OS handling down to the “user” ● The VO Frontend admins in glideinWMS We do not want to expose it to the scientists ● This implies ● OS owned by the VO Frontend ● Glidein Factory still needs to manipulate it – Add pilot bits – Proper Cloud-provider-specific contextualization ● Pilot process now a system service Not strictly needed, but very little reason not toCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 19
  • 20. Option 3 - AnalysisAdvantages: Disadvantages: ● VO can now provide ● VO has to provide an custom OS image OS image Expertise problem ● The pilot is a proper ● Increased security risk batch system daemon – Dynamic system configuration – Superuser processes ● Less work for the ● More work for VO(s) Glidein Factory admins – Must maintain the VO image ● Potential image in Glidein Factory manipulation problems – Given arbitrary OS imageCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 20
  • 21. The network traffic cost ● Most Cloud providers charge users both for CPU and WAN network usage ● Network was “free” in the Grid world ● Data prestaging required to keep costs down ● Data hosting still cheaper than networking ● This includes software (including the OS) ● Implications on the OS ownership options: ● VO-owned software likely more dynamic – Thus higher costCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 21
  • 22. Making a choice ● No option a clear winner ● Each option has significant advantages and disadvantages ● Still want to pick one ● To focus development ● As the recommended But may end up supporting all of them deployment strategy in the codeCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 22
  • 23. Option 2 chosen ● Option 3 not suitable for current user base ● Not enough expertise in VOs to maintain full OS Manpower also ● Happy with only a few OS versions a major issue ● Option 1 too restrictive ● Some VOs are eager to tweak system settings – Major reason to look at the Cloud ● Major drawback of option 2 is security risk Compared to Option 1 ● Can be mitigated by using superuser privileges only when absolutely neededCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 23
  • 24. Future work ● The glideinWMS project about to release the first Cloud-enabled version ● Works fine in internal tests ● But no real-world Cloud experience yet ● Just inferring from our Grid-based deployments ● Eager to put it in real-user hands ● And collect their feedbackCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 24
  • 25. Conclusions ● Our user base lives in batch system paradigm ● Extensive experience with Grid world ● But would like to expand into Cloud resources ● OS ownership the major change ● glideinWMS architecture allows for delegating the ownership to an intermediate service ● And lower the migration cost ● But is it a good idea? ● Yes, if some customization allowedCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 25
  • 26. Acknowledgments ● This work is partially sponsored by ● the US National Science Foundation under Grants No. OCI-0943725 (STCI), PHY-1104549 (AAA), and PHY-0612805 (CMS Maintenance & Operations) and ● the US Department of Energy under Grants No. DE-SC0002298 (ANDSL) and DE-FC02- 06ER41436 subcontract No. 647F290 (OSG).CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 26
  • 27. Backup slidesCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 27
  • 28. glideinWMS in Numbers ● OSG Glidein Factory at UCSD ● Serving ~10 FrontendsCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 28
  • 29. glideinWMS in Numbers ● CMS Frontend at CERN ● Using 3 Glidein FactoriesCLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 29

×